Site Reliability Engineer

7 days ago


Singapore GXS Bank Full time

Get to know the Role

We treat Infrastructure and operations as Software Engineering problems. Our mission is to build and progress software platforms which enables the provisioning and managing of all Digibank services in safe, reliable and scalable ways. We consistently challenge the status quo, use new technologies to build platforms and tooling for engineering teams. In this role you will make significant decisions with a huge impact on building modern banking technology. You would be part of a team, responsible for designing & architecting new solutions, finding creative ways to optimise existing solutions which will improve agility for managing hundreds of microservices infrastructures in a stable & reliable way.

If you are:

  • A strong believer of automating DevOps & SRE aspects like infrastructure provisioning, deployment, observability, incident lifecycle, uptime SLA etc.
  • Bold to challenge, open to get challenged, curious to learn & grow
This is the right place for you

The Day-to-Day Activities:
  • Working with Kubernetes clusters hosted in AWS
  • Using InfrastructureAsCode tooling like Terraform, and Ansible to manage AWS, Azure & Kubernetes resources
  • Engage with the development teams throughout the life cycle to help develop software for reliability and scale. Coaching team's SRE best practices
  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
  • Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
  • Build and drive adoption for greater self-healing and resiliency patterns
  • Design automated software and product upgrades, change management, and release management solutions
  • Design, code, test and deliver software to automate manual operational work. Own your tools and services end to end.
  • Performance and cost optimization for infrastructure
  • Be part of an on-call rotation for the team's tooling and 24x7 support coverage as needed
  • Succeed, fail, and learn together with other talented people. We believe in an environment that provides an opportunity for growth and see education as an outcome of failure that gets us closer to the next breakthrough
The Must-Haves:
  • Bachelor's degree in information systems, information technology, computer science, or similar.
  • 1-4+ years of professional experience.
  • Experience with administering Kubernetes cluster
  • Experience with managing Infrastructure as code using Terraform
  • Direct production operations experience in a cloud environment.
  • Experience contributing to technology and product strategy.
  • Experience leading capability-building initiatives across diverse areas such as infrastructure and operations automation, observability, incident management, architecting HA systems, and other core engineering.
  • Demonstrated experience in driving operational efficiency and transparency of a growing engineering organization.


  • Singapore NLS Full time

    My client, a global hedge fund, is actively seeking a hands on a highly skilled and motivated SRE to join their team. As an SRE, you will play a critical role in driving the adoption of Site Reliability Engineering practices within their organization. The ideal candidate will have a strong technical background and a passion for driving operational efficiency...


  • Singapore Retentia technology private limited Full time

    **3+ years of experience in Site Reliability Engineering, DevOps**, or a related field. - **Strong knowledge of cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).** - Experience with automation and configuration management tools (e.g., T**erraform, Ansible, Chef, or Puppet).** - Proficiency in at least **one programming...


  • Singapore The Edge Asia Full time

    Our client is a US hedge fund and their Technology group is constantly improving the company’s IT infrastructure, positioning them at the forefront of a rapidly evolving technology landscape. They are a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. Their...


  • Singapore IFUN GAMES Full time

    **Responsibilities** - Design, implement, and maintain tools and processes for monitoring, alerting, and incident response - Collaborate with developers to improve the design and operation of systems, with a focus on reliability, performance, and scalability - Participate in on-call rotations to respond to incidents and handle escalations - Analyze system...


  • Central Singapore Emprego SG Full time

    **Location** Singapore, Central Singapore **Job Type** Permanent **Salary** 9,000 - 15,000 Per **Date Posted** 5 hours ago Additional Details **Job ID** 16908 **Job Views** 1 Roles & Responsibilities **Objectives of this Role** - Run the production environment by monitoring availability and taking a holistic view of system health Improve...


  • Singapore J P INFOTEC PTE. LTD. Full time

    **Site Reliability Engineer** **Responsibilities** - Support and/or own the deployment of global products including setting up production and internal environments - Provide 24/7 first line of Engineering support (via follow the sun teams in all regions) for any issues related to global product deployment, availability and internal operations support. -...


  • Singapore Experis Full time

    **Site Reliability Engineer**: - Location- Singapore- Job reference- BBBH133368_1699927914- Salary- S$6000 - S$7500 per month- Consultant name - Rajasekar Shirley Monisha Consultant contact no. - 6232 5244 - EA License No. - 02C3423 - Consultant Registration No. - R22106767 **Responsibilities**: - Responsible for deployment, change, issues triage and...


  • Singapore COMBUILDER PTE LTD Full time

    Roles & ResponsibilitiesWe are seeking talented and driven professionals to join our Site Reliability Engineering (SRE) team. This role involves helping organizations enhance the availability, performance, and resilience of their applications and services through the deployment and administration of Observability Platforms.Key ResponsibilitiesDeploy and...


  • Singapore AKAMAI TECHNOLOGIES APJ PTE. LTD. Full time

    **Join our Site Reliability team**: **Help us shape the future of the Internet**: As a Senior Site Reliability Engineer, you will be responsible for: - Deploying, managing, and operating scalable, highly available, and fault-tolerant systems on the Akamai Zero Trust Cloud Platform - Analysing and improving security, stability, speed, and capacity of Akamai...


  • Singapore Ellwood Consulting Full time

    **Contract type**: Permanent **Location**: Singapore **Salary**: SGD8,000 - SGD12,000 per month **Contact name**: Roy Mok Zi An **Published**: about 3 hours ago Job description About our client For more than 60 years, my client has been a leader in the interactive and game industry. The company also creates, manufactures, and sells coin-operated...


  • Singapore FUNFLY PTE. LTD. Full time

    Roles & ResponsibilitiesPosition OverviewAs a site reliability engineer, you will be responsible for ensuring the smooth operation of game services by maintaining, monitoring, and responding to faults daily. They will develop automation tools to enhance operational efficiency and manage game servers for optimal performance. The role includes collaborating...


  • Singapore GK CONSULTING PTE. LTD. Full time

    Roles & ResponsibilitiesWe're seeking an experienced Senior Site Reliability Engineer to ensure the reliability, availability, and performance of our cloud-based internet services.Key Responsibilities1. Own reliability, availability, and user experience for assigned cloud services2. Develop and implement service governance initiatives to increase reliability...


  • Singapore Hays Full time

    **Your new company** *** One of the famous Internet and video game company in the world, they are expanding in global market and increasing headcount in Singapore as a regional hub. They are currently providing over hundreds PC and mobile games across a world-wide range of genres in over 200 countries. **Your new role** *** As a Site Reliability Engineer,...


  • Singapore NomiSo Full time

    **Lead Site Reliability Engineer** **Pay**:10,000-12,000 SGD/Month **About NomiSo**: NomiSo is a product and services engineering company. We are a team of Software Engineers, Architects, Managers, and Cloud Experts with expertise in Technology and Delivery Management. Our mission is to Empower and Enhance the lives of our customers through simple...


  • Singapore DT One Full time

    **About DT One** **Key Responsibilities** - Run the production environment by monitoring availability and taking a holistic view of system health - Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve - Improve reliability, quality, security, and...


  • Singapore Visier Solutions Inc Full time

    **Visier is the leader in people analytics and we believe in a 'people-first' approach to business strategy. Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, we are looking for talented...


  • Singapore NodeFlair Full time

    **Job Summary**: **Salary** S$7,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 4 years **Tech Stacks** Analytics Spring Shell OOP Logstash Chef Puppet UNIX Kibana Grafana Linux kafka Springboot Ansible Node.js Elasticsearch Python **NTT DATA Singapore PTE Ltd is a wholly owned subsidiary of NTT DATA Corp, a part...


  • Singapore Rakuten Asia Pte Ltd Full time

    Team’s overall mission is to boost-up each business unit’s revenue by standardizing operations, maximizing productivity, and providing high quality service to our customers (Internal/external). **Responsibilities**: - Design, develop SLA, SLO, SLI of services within the Business Unit. - Involve in automation of routine manual production/non-production...


  • Singapore FLOWDESK ASIA PTE. LTD. Full time

    Roles & ResponsibilitiesAbout the jobAre you passionate about maintaining robust and high-performing infrastructures? Do you thrive in managing complex network environments and ensuring system reliability?Join our infrastructure team and help us elevate operational excellence to new heights.As a Site Reliability Engineer at Flowdesk, you will be at the heart...


  • Singapore Experis Full time

    **Site Reliability/DevOps Engineer**: - Location- Singapore- Job reference- BBBH133437_1700207249- Salary- S$5500 - S$7500 per month- Consultant name - Claudia Kueh Kee Jinq Consultant contact no. - 65515579 - EA License No. - 02C3423 - Consultant Registration No. - R1880247 **Responsibilities**: - Responsible for deployment, change, issues triage and...