Site Reliability Engineer

3 days ago


Singapore Second Talent Full time


Cluster Operations & Management

  • Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units
  • Ensure optimal performance, scalability, and reliability of distributed systems

Infrastructure Platform Development

  • Design, build, and enhance infrastructure operation platforms
  • Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
  • Drive platform standardization and automation initiatives

High Availability & Reliability

  • Ensure maximum uptime for production services through proactive monitoring and incident response
  • Continuously optimize service architecture, deployment strategies, and operational processes
  • Implement and maintain SLA/SLO frameworks and reliability engineering practices

Automation & Process Improvement

  • Lead the development of automated operations and maintenance systems
  • Create self-service tools and workflows to improve team productivity
  • Establish best practices for infrastructure such as code and configuration management

Required Qualifications

Experience & Education

  • 2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE)
  • Bachelor's degree in Computer Science, Engineering, or related technical field preferred

Cloud & Infrastructure

  • Experience with public cloud platforms (AWS, Azure, or GCP) is highly valued
  • Strong understanding of large-scale internet architecture and distributed systems
  • Proven experience with infrastructure monitoring, logging, and observability tools

Technical Skills

  • Proficiency in scripting and automation using Shell, Python, or similar languages
  • Strong knowledge of containerization technologies (Kubernetes, Docker)
  • Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
  • Strong familiarity with common infrastructure components: Nginx, MySQL, Redis, Kafka, Elasticsearch

Advanced Networking (Preferred)

  • Experience with Service Mesh architectures, Cilium CNI, and eBPF technologies
  • Understanding network security, load balancing, and traffic management
  • Knowledge of cloud-native networking patterns and best practices



  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore Qlik Full time

    **What makes us Qlik?** A Gartner® Magic Quadrant Leader for 14 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel in...


  • Singapore Second Talent Full time

    Job Title: Site Reliability Engineer Location: Singapore Job Type: Full-timeResponsibility: Cluster Operations & ManagementManage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business unitsEnsure optimal performance, scalability, and reliability of distributed...


  • Singapore Adyen Full time

    **This is Adyen** Adyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition. For our teams, we create an environment with opportunities for our people to succeed, backed by the...


  • Singapore NTT Data Singapore Full time

    As a Site Reliability Engineer you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault tolerant and designed to scale. You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime. Key...


  • Singapore Viasat Full time

    About us One team. Global challenges. Infinite opportunities. At Viasat, we’re on a mission to deliver connections with the capacity to change the world. For more than 35 years, Viasat has helped shape how consumers, businesses, governments and militaries around the globe communicate. We’re looking for people who think big, act fearlessly, and create an...


  • Singapore RigNet Full time

    About us One team. Global challenges. Infinite opportunities. At Viasat, we’re on a mission to deliver connections with the capacity to change the world. For more than 35 years, Viasat has helped shape how consumers, businesses, governments and militaries around the globe communicate. We’re looking for people who think big, act fearlessly, and create an...


  • Singapore Tek Systems Full time

    We are hiring a Site Reliability Engineer (SRE) to manage, support, and enhance enterprise data platforms. This role focuses on platform reliability, automation, and integration, ensuring scalability, stability, and compliance in a dynamic and fast-paced environment. The Position: Design and implement automation frameworks to streamline operational tasks for...


  • Singapore Point72 Full time

    Join to apply for the Site Reliability Engineer role at Point72 About the role As part of Point72’s Technology Team, you will focus on developing and maintaining complex, distributed, real-time systems that support our Global Macro business. Your responsibilities will include optimizing operations through automation, building foundational SRE components,...


  • Singapore DT One Full time

    About DT One DT One was founded to provide mobile carriers with the infrastructure and services they need to help migrant workers stay in touch with their family and friends back home. Today we operate a leading global network for mobile top‑up solutions, innovative mobile rewards, and Phone‑to‑Phone solutions. Our global network delivers better...