
Reliability Architect
1 week ago
Job Title
Reliability Engineering Manager
We are seeking an experienced Reliability Engineering Manager to lead a high-impact team focused on building robust, scalable infrastructure and ensuring platform reliability across our cloud environments. This role combines strategic leadership with deep technical expertise in automation, observability, and modern DevOps practices to drive operational excellence and service uptime.
This position is open exclusively to candidates who reside in and are authorized to work in the designated location.
As part of our recruitment process, we may collect personal data to support hiring-related activities such as screening, assessment, and communication.
Key Responsibilities:
- Define and implement reliability roadmaps aligned with business objectives and service level agreements.
- Collaborate with service owners to define service level objectives supporting service level agreement commitments.
- Deliver platform performance insights through reports and observability tools.
- Integrate reliability best practices into engineering and product workflows.
- Lead initiatives on uptime, monitoring, incident response, and optimization.
- Manage incident response processes, on-call rotations, and playbooks.
- Set infrastructure reliability standards for cloud-native environments.
- Optimize architecture for scalability, fault tolerance, and cost efficiency.
- Ensure production systems meet security and compliance requirements.
- Provide strategic leadership and mentorship to drive team growth and performance.
- Design scalable and resilient systems architecture.
- Recruit, mentor, and retain high-performing reliability engineers.
- Develop growth and training plans for reliability team members.
- Foster a reliability-focused, customer-centric team culture.
Requirements:
- Bachelor's degree in Computer Science or a related field.
- Cloud certification in AWS, Azure, or GCP preferred.
- 8+ years in Software Engineering or Site Reliability Engineering.
- 3+ years in team management or technical leadership.
- Expert-level Linux administration, scripting, and troubleshooting.
- Strong hands-on experience with continuous integration/continuous delivery and software development life cycle practices.
- Deep passion for automation, security, and self-service.
- Proficient in AWS, GCP, and/or Azure cloud platforms.
- Skilled in infrastructure-as-code tools like Terraform, CloudFormation, Helm, and Ansible.
- Experienced with containers, Kubernetes, and microservice architectures.
- Excellent verbal and written communication skills.
Benefits:
- Competitive compensation.
- Hybrid work model.
- 18 days of annual leave (with accrual up to 20 days).
- Entitled to public holidays.
- Other leave benefits.
- Health insurance for you and your dependents.
- Growth opportunities.
- Work in a global company with meaningful work, highly skilled colleagues, and an amazing culture.
-
Reliability Architect
6 days ago
Singapore beBeeRelevance Full time $90,000 - $120,000Job Title:Reliability ArchitectAs a key member of our team, you will play a vital role in ensuring the stability and performance of critical services. Your expertise will help bridge the gap between development and operations, guaranteeing robust, scalable, and responsive infrastructure.Key Responsibilities:\
-
Backend Software Engineer
4 days ago
Singapore Refine Group Full timeResponsibilities Team introduction: Build Reliability at Global Scale Every time a short video is posted or viewed on TikTok, our team is working behind the scenes to make sure it happens instantly and reliably. The Short Video Reliability team blends deep systems expertise with large-scale architecture design to keep TikTok running smoothly for billions of...
-
Site Reliability Engineer
2 weeks ago
Singapore TP-LINK CORPORATION PTE. LTD. Full time**Responsibilities**: - Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms. - Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform. - Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices. - Build...
-
Site Reliability Engineer
6 days ago
North-East Singapore PERSOLKELLY Full timeThe Site Reliability Engineer is responsible for ensuring the reliability, scalability, and efficiency of our systems and infrastructure. This role involves monitoring, troubleshooting, and resolving issues to maintain optimal performance. The engineer will also collaborate with cross-functional teams to automate processes and improve system reliability....
-
Site Reliability Engineer
4 days ago
Singapore ABAXX SINGAPORE PTE. LTD. Full timeSite Reliability Engineer - Networking We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house. This role is ideal for someone with a strong foundation in AWS services, infrastructure as code, and cloud security, who is passionate about building scalable, secure,...
-
Reliability Engineer
1 week ago
Singapore HYPERSCAL SOLUTIONS PTE. LTD. Full time**COMPANY DESCRIPTION** NTUC Enterprise Co-operative Limited is the holding entity and single largest shareholder of the NTUC group of Social Enterprises. We aim to create a greater social force to do good by harnessing the capabilities of the social enterprises to meet pressing social needs in areas like health and eldercare, childcare, daily essentials,...
-
Reliability Engineer
1 hour ago
Singapore NE Digital Full timeCOMPANY DESCRIPTION NE Digital is the digital, data and technology organization that serve as a center of excellence to drive digital transformation for our group of NTUC Social Enterprises to meet the critical social needs of Singapore's community. Delivering innovative products and solutions, we empower our people to lead a better and meaningful life...
-
System Reliability Specialist
2 weeks ago
Singapore beBeeInfrastructure Full time $90,000 - $120,000Reliable Systems EngineerWe're looking for a skilled engineer to join our team and help us build and maintain reliable systems.Treat infrastructure and operations as software engineering problems.Design and architect new solutions to improve system agility.Optimize existing solutions to reduce downtime and increase efficiency.Key Responsibilities:Manage AWS,...
-
Site Reliability Engineer
4 days ago
Singapore TP-LINK CORPORATION PTE. LTD. Full timeResponsibilities: Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms. Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform. Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices. Build Observability for...
-
Reliability Engineer
2 weeks ago
Singapore NE Digital Full timeCOMPANY DESCRIPTION NE Digital is the digital, data and technology organization that serve as a center of excellence to drive digital transformation for our group of NTUC Social Enterprises to meet the critical social needs of Singapore's community. Delivering innovative products and solutions, we empower our people to lead a better and meaningful life...