Senior Site Reliability Engineer

1 week ago


Singapore GXS Bank Full time

**About the Team**:
Our team treats infrastructure and operations as software engineering problems. We are responsible for building and progressing software platforms that enable the provisioning and management of all Digibank services in safe, reliable, and scalable ways. We consistently challenge the status quo and use new technologies to build platforms and tooling for engineering teams. Join us and make significant decisions with a huge impact on building modern banking technology.

**About the Role**:
We treat Infrastructure and operations as Software Engineering problems. Our mission is to build and progress software platforms which enables the provisioning and managing of all Digibank services in safe, reliable and scalable ways. We consistently challenge the status quo, use new technologies to build platforms and tooling for engineering teams. In this role you will make significant decisions with a huge impact on building modern banking technology. You would be part of a team, responsible for designing & architecting new solutions, finding creative ways to optimize existing solutions which will improve agility for managing hundreds of microservices infrastructures in a stable & reliable way.

If you are:

- A strong believer of automating DevOps & SRE aspects like infrastructure provisioning, deployment, observability, incident lifecycle, uptime SLA etc.
- Bold to challenge, open to get challenged, curious to learn & grow

This is the right place for you

**Roles and Responsibilities**:

- Working with Kubernetes clusters hosted in AWS
- Using InfrastructureAsCode tooling like Terraform, and Ansible to manage AWS, Azure & Kubernetes resources
- Engage with the development teams throughout the life cycle to help develop software for reliability and scale. Coaching team's SRE best practices
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
- Build and drive adoption for greater self-healing and resiliency patterns
- Design automated software and product upgrades, change management, and release management solutions
- Design, code, test and deliver software to automate manual operational work. Own your tools and services end to end.
- Performance and cost optimization for infrastructure
- Be part of an on-call rotation for the team's tooling and 24x7 support coverage as needed
- Succeed, fail, and learn together with other talented people. We believe in an environment that provides an opportunity for growth and see education as an outcome of failure that gets us closer to the next breakthrough

**Qualifications**:

- Bachelor's degree in information systems, information technology, computer science, or similar.
- 5-7+ years of professional experience.
- Experience with administering Kubernetes cluster
- Experience with managing Infrastructure as code using Terraform
- Direct production operations experience in a cloud environment.
- Experience contributing to technology and product strategy.
- Experience leading capability-building initiatives across diverse areas such as infrastructure and operations automation, observability, incident management, architecting HA systems, and other core engineering.
- Demonstrated experience in driving operational efficiency and transparency of a growing engineering organization.



  • Singapore Manus AI Full time

    Direct message the job poster from Manus AI 1.Manage and maintain container clusters and other open-source component clusters across various business lines 2.Build and enhance infrastructure operation platforms, including infrastructure management, CI/CD, monitoring/alerting, and logging systems 3.Respond quickly to incidents and implement effective...


  • Singapore PERSOLKELLY Full time

    We have partnered with a renowned global leader in information and communications technology (ICT) infrastructure and smart devices. They are providing full-stack, all-scenario solution for products and services carriers, enterprises, governments, and individual consumers worldwide. Our client is looking for enthusiastic Site Reliability Engineer to...


  • Singapore ByteDance Full time

    Responsibilities About the Team The Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable. Responsibilities Build, expand,...


  • Singapore Garena Full time

    Senior/Expert Engineer, Site Reliability Engineering (SRE)Join to apply for the Senior/Expert Engineer, Site Reliability Engineering (SRE)role at Garena Senior/Expert Engineer, Site Reliability Engineering (SRE)1 day ago Be among the first 25 applicants Join to apply for the Senior/Expert Engineer, Site Reliability Engineering (SRE)role at Garena Get...


  • Singapore AKAMAI TECHNOLOGIES APJ PTE. LTD. Full time

    As a Senior Site Reliability Engineer, you will influence a wide array of teams. You will be responsible for the performance and reliability of Akamai’s delivery products by working with the Product, Engineering and Support teams to diagnose, mitigate and solve outages. You will have to solve some of the most complex problems in distributed systems at...


  • Singapore TikTok Full time

    Site Reliability Engineer - Data Management Suite Site Reliability Engineer - Data Management Suite Responsibilities About the Team The Data Management Suite team is building products that cover the whole lifecycle of data pipeline, including data ingestion and Integration, data development, data catalog, data security and data governance. These products...


  • Singapore Canonical Full time

    Senior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and...


  • Singapore Mondrian Alpha Full time

    Senior Site Reliability Engineer / Trade Systems Engineer - Leading Systematic Hedge Fund - Singapore 2 days ago Be among the first 25 applicants Direct message the job poster from Mondrian Alpha Hedge Fund Technology Search | Mondrian Alpha My client, a renowned systematic hedge fund with a global presence, is in search of a seasoned Trade Support Engineer...


  • Singapore Manpower Singapore Full time

    This range is provided by Manpower Singapore. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range Responsibilities: Responsible for deployment, change, issues triage and infrastructure management of overseas games and relevant components and system, e.g. game monitor system, login services....


  • Singapore ByteDance Full time

    ResponsibilitiesAbout the TeamOur team is dedicated to elevating the level of cybersecurity to fully support Bytedance as well as our clients' digital journey. We aim high at building the next-generation cybersecurity. Rooted from years of practical experience in the enterprise security domain within ByteDance, the team now runs as a business. We provide a...