Site Reliability Engineer
5 days ago
Key Responsibilities Reliability Engineering: Define and implement SLIs, SLOs, and error budgets to measure and improve service reliability. Cloud Infrastructure: Design, deploy, and manage infrastructure on Google Cloud Platform (GCP) or other major cloud providers. Kubernetes Operations: Administer and optimize GKE (Google Kubernetes Engine) clusters, ensuring high availability and performance. Support & Incident Management: Participate in on‑call rotations and handle L2/L3 support for production systems. Lead incident response, root cause analysis, and postmortems. Collaborate with teams to reduce MTTR and improve incident workflows. Automation & Tooling: Develop tools and scripts using Python, Go, or Bash to automate operational tasks and improve system efficiency. Monitoring & Observability: Implement and maintain monitoring, logging, and alerting systems using tools like Prometheus, Grafana, ELK, or Stackdriver . API Management: Build and maintain internal APIs and integrations that support platform operations and automation. Infrastructure as Code: Use tools like Terraform, Helm, and GitOps to manage infrastructure in a scalable and repeatable manner. Collaboration & Culture: Work closely with development, QA, and product teams to embed reliability into the software development lifecycle. Required Qualifications 5–10 years of experience in SRE, DevOps, or Infrastructure Engineering roles. Strong hands‑on experience with cloud platforms, especially GCP . Proficiency in scripting/programming (Python, Go, Bash). Deep understanding of Kubernetes, with hands‑on experience in GKE . Solid knowledge of SQL and relational database systems. Experience implementing and managing SLIs/SLOs and reliability metrics. Familiarity with RESTful APIs and microservices architecture. Strong troubleshooting and debugging skills in distributed systems. Excellent communication and collaboration skills. Preferred Qualifications Cloud certifications (e.g., GCP Professional Cloud Engineer). Experience with incident management platforms (e.g., PagerDuty, Opsgenie). Exposure to DevOps practices, CI/CD pipelines, and agile methodologies. Experience with security and compliance in cloud environments. #J-18808-Ljbffr
- 
					
						Site Reliability Engineer
3 weeks ago
Singapur, Singapore NetEase Games Full timeOverview Join to apply for the Site Reliability Engineer role at NetEase Games . As a leading internet technology company based in China, NetEase provides premium online services centered around content creation and operates a broad gaming ecosystem. Job Description Site Reliability Engineering (SRE) refers to using software engineering methods to manage...
 - 
					
						Site Reliability Engineer
3 weeks ago
Singapur, Singapore APPLE SOUTH ASIA PTE. LTD. Full timeSummary At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there’s no telling what you could accomplish. The people here at Apple don’t just build products - they craft the kind of wonder that’s revolutionized entire industries. It’s the...
 - 
					
						Site Reliability Engineer
2 weeks ago
Singapur, Singapore PERSOL SINGAPORE PTE. LTD. Full timeOverview Site Reliability Engineer (SRE) – An excellent Site Reliability Engineer (SRE) opportunity is available in a cutting-edge, fast-growing cloud environment. Job Purpose Deliver reliable, secure, and scalable cloud services by managing and optimizing AWS infrastructure. Job Responsibilities Manage and support AWS services, ensuring uptime,...
 - 
					
						Cloud Site Reliability Engineer
2 weeks ago
Singapur, Singapore PERSOL SINGAPORE PTE. LTD. Full timeCloud Site Reliability Engineer (AWS) An excellent Cloud Site Reliability Engineer opportunity has just arisen in a global brand supporting mission‑critical government systems. Job Purpose Ensure reliable, secure, and automated cloud operations supporting mission‑critical systems and compliance needs. Responsibilities Manage and support AWS cloud...
 - 
					
						Site Reliability Engineer
2 weeks ago
Singapur, Singapore Crystal Equation Corporation Full timeOverview We are seeking a skilled Site Reliability Engineer (SRE) to join our team. SRE will be responsible for keeping all internal user-facing applications and other production systems running smoothly. This hybrid role involves a combination of both development and operations skills to build and manage systems that are both efficient and reliable. The...
 - 
					
						Site Reliability Engineer
3 weeks ago
Singapur, Singapore Thales Full timeOverview Join to apply for the Site Reliability Engineer role at Thales . Location: Singapore, Singapore Thales is a global technology leader trusted by governments, institutions, and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation, our solutions empower critical...
 - 
					
						Site Reliability Engineer
2 weeks ago
Singapur, Singapore E-Solutions Full timeJob Title: Site Reliability Engineer (SRE) Experience: 8+ years (including 3+ years in Java) About the Role: We’re looking for a skilled Site Reliability Engineer with strong Java and cloud-native development experience to design, build, and maintain reliable, scalable systems on Kubernetes and AWS. You’ll work closely with development and platform teams...
 - 
					
						Site Reliability Engineer
2 weeks ago
Singapur, Singapore Razer Inc. Full timeJoin to apply for the Site Reliability Engineer role at Razer Inc. 3 weeks ago Be among the first 25 applicants Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work , offering you the opportunity to make an impact globally while working across a team located across 5 continents. Razer is...
 - 
					
						Site Reliability Engineer
3 weeks ago
Singapur, Singapore TikTok Full timeOverview Responsibilities About the team TikTok Shop is a content e-commerce business utilising international short video products as carriers. Our aim is to become the preferred choice for users seeking to discover and purchase affordable, high-quality products. We provide users with tailored, vibrant, and efficient consumption experiences while enabling...
 - 
					
						Site Reliability Engineer
2 weeks ago
Singapur, Singapore Manpower Singapore Full timeSite Reliability Engineer - Global Support Apply for the Site Reliability Engineer - Global Support role at Manpower Singapore . Responsibilities Deploy and manage overseas games infrastructure, including game monitor system and login services. Monitor and dashboard game observability to ensure reliability, scalability, and security. Analyze game...