Site Reliability Engineer
1 week ago
**About ByteDance**
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
**Why Join Us**
At ByteDance, our people are humble, intelligent, compassionate and creative. We create to inspire - for you, for us, and for millions of users across all of our products. We lead with curiosity and aim for the highest, never shying away from taking calculated risks and embracing ambiguity as it comes. Here, the opportunities are limitless for those who dare to pursue bold ideas that exist just beyond the boundary of possibility. Join us and make impact happen with a career at ByteDance.
**About the Team**
Our infrastructure team is seeking experienced site reliability engineers to build globally distributed platform for provisioning and deploying edge services, such as traffic acceleration, CDN cache, gaming, etc. We use Kubernetes to manage on-prem/cloud nodes and build an eco-system around it, including tools for monitoring, alerting, logging, CI/CD, etc. and various services with automated deployment/scaling in order to maximize daily operation efficiencies. On top of the Kubernetes infrastructure, we build a PaaS platform to help deploy and manage global edge services.
**Responsibilities**
- Deploy and administrate Kubernetes clusters both on-prem and in cloud (AWS, GCP, etc.).
- Collaborate with software engineers to build enterprise-level platform (PaaS) with cutting-edge Cloud Native Computing Foundation (CNCF) technologies.
- Design, develop, automate, and continuously improve platform services and pipelines, such as monitoring, alerting, logging, tracing, CI/CD, etc.
- Improve Kubernetes system efficiency and debug issues related to networking, storage, scheduling, etc.
- Collaborate with open-source communities to advance Kubernetes and Cloud Native technologies.
**Qualifications**
- Master’s degree (or Bachelor's degree with 5+ years of experience) in Computer Engineering, Computer Science, or related fields.
- Experience in Kubernetes administration.
- Experience in Unix/Linux systems from kernel to shell and beyond.
- Experience with Kubernetes CNI deployment and troubleshooting, including (but not limited to) the following CNIs: Cilium, Kube-Router, Calico, Flannel.
- Experience in designing, analyzing, and building automation tools for large scale and complex systems.
**Preferred Qualifications**
- CKA (Certified Kubernetes Administrator) certification.
- Experience in using and contributing to open-source projects in Kubernetes ecosystem, e.g. Kubespray, CNI, Helm, KubeEdge, Istio/Linkerd, Prometheus, ArgoCD, OPA, Harbor, Envoy, etc.
- Experience in networking technologies such TCP/IP, BGP, DNS, load balancers, etc.
- Experience in CI/CD pipeline design and development.
- Experience in Kubernetes API, Operator, and Custom Resource Definition (CRD) development.
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
-
Site Reliability Engineer
4 weeks ago
Singapore PERSOLKELLY Full timeWe have partnered with a renowned global leader in information and communications technology (ICT) infrastructure and smart devices. They are providing full-stack, all-scenario solution for products and services carriers, enterprises, governments, and individual consumers worldwide. Our client is looking for enthusiastic Site Reliability Engineer to...
-
Site Reliability Engineer
2 weeks ago
Singapore ByteDance Full timeResponsibilities About the Team The Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable. Responsibilities Build, expand,...
-
Site Reliability Engineer
4 weeks ago
Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full timeSite Reliability Engineer**Roles and Responsibilities**The Site Reliability Engineer plays a crucial role in ensuring the availability, reliability, and performance of our production environment.Monitor system health and take a holistic view to ensure optimal operation. Implement site reliability automation to minimize downtime and reduce costs. Manage...
-
Site Reliability Engineer
1 week ago
Singapore Sea Limited Full timeEngineering and Technology - Infrastructure, Singapore - Entry Level Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Site Reliability Engineer, you are responsible for improving the availability and reliability of our Infrastructure services. - Responsible for...
-
Site Reliability Engineer
10 hours ago
Singapore TikTok Full timeSite Reliability Engineer - Data Management Suite Site Reliability Engineer - Data Management Suite Responsibilities About the Team The Data Management Suite team is building products that cover the whole lifecycle of data pipeline, including data ingestion and Integration, data development, data catalog, data security and data governance. These products...
-
Site Reliability Engineer
3 days ago
Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time**Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...
-
Site Reliability Engineer
16 hours ago
Singapore Manpower Singapore Full timeThis range is provided by Manpower Singapore. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range Responsibilities: Responsible for deployment, change, issues triage and infrastructure management of overseas games and relevant components and system, e.g. game monitor system, login services....
-
Senior Site Reliability Engineer
2 hours ago
Singapore Manus AI Full timeDirect message the job poster from Manus AI 1.Manage and maintain container clusters and other open-source component clusters across various business lines 2.Build and enhance infrastructure operation platforms, including infrastructure management, CI/CD, monitoring/alerting, and logging systems 3.Respond quickly to incidents and implement effective...
-
Site Reliability Engineer
2 weeks ago
Singapore beBee Careers Full timeSite Reliability EngineerWe are seeking a talented Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our systems.Job Description:We are looking for an experienced engineer who can develop, support, administer, and consult on the SecDb runtime environment....
-
Site Reliability Engineer, SealSuite
3 weeks ago
Singapore ByteDance Full timeResponsibilitiesAbout the TeamOur team is dedicated to elevating the level of cybersecurity to fully support Bytedance as well as our clients' digital journey. We aim high at building the next-generation cybersecurity. Rooted from years of practical experience in the enterprise security domain within ByteDance, the team now runs as a business. We provide a...