Senior Site Reliability Engineer

5 days ago


Singapore VANGUARD SOFTWARE PTE. LTD. Full time

Job Summary We are seeking a Senior Site Reliability Engineer (SRE)to join our growing engineering team. In this role, you will work independently to design, build, and optimize infrastructure and deployment pipelines that ensure the stability, scalability, and security of our systems. You will take full responsibility for automating workflows, improving observability, and enabling development teams to ship code faster and safer. This is an excellent opportunity for an experienced engineer with at least 5 years of work experience who thrives on ownership, reliability, and technical leadership. Key Responsibilities Infrastructure & Automation: Design, implement, and maintain scalable cloud infrastructure using Infrastructure as Code (IaC) tools. CI/CD Pipelines: Build and optimize automated pipelines for testing, deployment, and release management. Monitoring & Reliability: Establish observability standards, implement monitoring, logging, and alerting systems to ensure system health. Security & Compliance: Enforce best practices for cloud security, access control, and compliance across environments. Collaboration: Partner with backend, frontend, and product teams to ensure smooth deployments and reliable system operations. Process & Mentorship: Improve DevOps processes, share best practices, and mentor junior engineers. Job Requirements Bachelor's Degree of Computing, Software Engineering, IT or related field. Experience: Minimum 5 years of DevOps, Site Reliability Engineering (SRE), or related experience. Tech Stack: Proficient with cloud platforms (AWS, GCP, or Azure), containerization (Docker, Kubernetes), IaC (Terraform, Ansible, Helm), and CI/CD tools (Jenkins, GitHub Actions, GitLab CI/CD, ArgoCD, etc.). Systems Knowledge: Strong background in Linux administration, networking, and distributed systems. Monitoring & Observability: Hands‐on experience with tools like Prometheus, Grafana, ELK/EFK, or Datadog. Scripting & Automation: Proficient in one or more languages (Python, Go, Bash, etc.). Problem Solving: Skilled at diagnosing complex issues, ensuring high availability, and improving system performance. System Design: Capable of designing fault‐tolerant, secure, and scalable infrastructure with disaster recovery in mind. Good in written and spoken English and Mandarin is highly desirable to liaise with Chinese‐speaking clients and counterparts to understand their technical requirements. Soft Skills Team Mindset: Collaborate effectively across teams, proactively contributing to company goals. Ownership: Take responsibility for infrastructure health and ensure continuous improvements. Adaptability: Open to new technologies, evolving processes, and changing business needs. Communication: Clearly explain technical topics to both engineers and non‐technical stakeholders. What We Offer Technical Leadership Opportunities: Lead infrastructure design for high‐impact projects and guide DevOps best practices. Continuous Growth: Access to mentorship, certifications, and a clear career progression path. High‐Performance Collaboration: Work with a talented team in a modern DevOps environment (Agile/CI‐CD, GitOps). Flexibility and Trust: An open culture that values innovation, autonomy, and results‐driven decision‐making. #J-18808-Ljbffr



  • Singapore Canonical Full time

    Senior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating...


  • Singapore GK CONSULTING PTE. LTD. Full time

    We're seeking an experienced Senior Site Reliability Engineer to ensure the reliability, availability, and performance of our cloud-based internet services. Key Responsibilities 1. Own reliability, availability, and user experience for assigned cloud services 2. Develop and implement service governance initiatives to increase reliability and user...


  • Singapore Qube Research & Technologies Full time

    Join to apply for the DevOps /Site Reliability Engineer role at Qube Research & Technologies Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research,...


  • Singapore Qube Research & Technologies Full time

    Join to apply for the DevOps /Site Reliability Engineer role at Qube Research & Technologies Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research,...


  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore Ll Oefentherapie Full time

    At Oracle Cloud Infrastructure (OCI), we build the more intelligent future of cloud. OCI Sovereign Cloud is a team of smart, motivated, and diverse people that are focused on bringing the world's most important work to OCI. We build and operate our government, classified, and sovereign cloud regions to be reliable and high performance, just like our public...


  • Singapore Canonical Full time

    Overview Join to apply for the Senior Site Reliability Engineer role at Canonical . Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform is Ubuntu, widely used in enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. We have 1200+...


  • Singapore Garena Careers Full time

    Senior/Expert Engineer, Site Reliability Engineering (SRE)Be among the first 25 applicants. Get AI-powered advice on this job and more exclusive features. Responsibilities Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance. Setup, manage and...


  • Singapore Dada Consultants Full time

    Direct message the job poster from Dada Consultants Recruitment Consultant - Specialised Technology Key Responsibilities Design, implement, and maintain highly available, scalable, and secure infrastructure Develop and improve observability (monitoring, logging, alerting) across all services Own incident response lifecycle: detection, mitigation, root cause...


  • Singapore Shopify Full time

    Company Description Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of...