Engineer, Sre

2 weeks ago


Singapore NodeFlair Full time

**Job Summary**:
**Salary**
S$6,000 - S$10,000 / Monthly

**Job Type**

**Seniority**

Junior

**Years of Experience**
At least 0 years

**Tech Stacks**
Go Kubernetes Python

We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong background in maintaining self-hosted Kubernetes clusters, where your primary focus will be on ensuring the stability and reliability of our production environment. Ensuring a smooth running infrastructure supports the work of our AI researchers as it provides them a steady and dependable platform.
- Work closely with AI researchers to understand their workflow and infrastructure needs, optimizing the cluster configurations accordingly.
- Implement monitoring, alerting, and self-healing systems to ensure high availability and performance of the clusters.
- Collaborate with development teams to design and implement best practices for infrastructure as code (IaC).
- Drive automation initiatives to reduce manual toil and improve system resilience and scalability.
- Document system design and procedures, provide guidance for researchers on our cluster advance usage.

**Job Requirements**
- Bachelor's degree or higher in Computer Science, Engineering, or related fields.
- Proven experience in managing self-hosted Kubernetes clusters in a production environment.
- Strong understanding of containerization, orchestration, and the Kubernetes ecosystem.
- Familiarity with AI workflows, machine learning/deep learning research background is a plus.
- Proficiency in at least one programming language (e.g., Python, Go) and scripting skills for automation.
- Good working attitude, problem-solving, critical thinking, and communication skills.


  • Cloud SRE Engineer

    3 weeks ago


    Singapore OCBC Full time

    Join to apply for the Cloud SRE Engineer - Linux role at OCBC 2 days ago Be among the first 25 applicants Join to apply for the Cloud SRE Engineer - Linux role at OCBC Who We AreAs Singapore's longest established

  • Engineer, Sre

    4 days ago


    Singapore Sea Limited Full time

    The SRE and Infrastructure teams in Sea Labs manage thousands of servers which serve millions of users. As an SRE Engineer, you will work with the team to improve the availability and reliability of our services, and drive our service management to the next level. - Engage in the design, implementation, testing and operation of our on-prem Kubernetes...

  • SRE Lead

    6 days ago


    Singapore Selby Jennings Full time

    Our client is a leading global investment firm and they are seeking an SRE lead to be based in their Singapore office. Strong focus on Enterprise and Reference Data Systems. Key Responsibilities of SRE Lead: Design and implement automated solutions for operational efficiency and reliability Troubleshoot and resolve production issues related to reference...

  • Sre Support Engineer

    2 weeks ago


    Singapore OPENSOURCE PTE. LTD. Full time

    Position: Site Reliability Engineer (SRE Support) **Responsibilities**: Demonstrate proficiency in automating manual tasks using Terraform scripting and other automation tools. Utilize Datadog as an observability tool to monitor and analyze system performance. Technical Skill Set: Strong expertise in Site Reliability Engineering (SRE) principles and...

  • SRE Lead

    2 weeks ago


    Singapore Selby Jennings Full time

    Our client is a leading global investment firm and they are seeking an SRE lead to be based in their Singapore office. Strong focus on Enterprise and Reference Data Systems. Key Responsibilities of SRE Lead: Design and implement automated solutions for operational efficiency and reliability Troubleshoot and resolve production issues related to reference...

  • Engineer, SRE

    1 day ago


    Singapore Rakuten International Full time

    Job Description: Rakuten International oversees 7 businesses with over 4,000 employees globally. The brand is recognized for its leadership and innovation in e-commerce, digital content, advertising, entertainment and communications, bringing the joy of discovery and access to more than 1 billion members across the world. Our teams deliver on the company's...

  • Sre Support Engineer

    2 weeks ago


    Singapore Opensource Pte Ltd. Full time

    Position: Site Reliability Engineer (SRE Support) **Responsibilities**: Demonstrate proficiency in automating manual tasks using Terraform scripting and other automation tools. Utilize Datadog as an observability tool to monitor and analyze system performance. Technical Skill Set: Strong expertise in Site Reliability Engineering (SRE) principles and...

  • Sre/devops Engineer

    6 days ago


    Singapore Skill Quotient Technologies Inc Full time

    **Role **: SRE/DevOps Engineer **Location **:Singapore **Payroll**: Skill Quotient **Experience** : 5-10 years **Requirements**: - **Experience**: 5+ years as a Platform Engineer or in a similar role like DevOps,SRE. - **Cloud Proficiency**: Strong experience with AWS or equivalent cloud environments. - **Operating Systems**: Expertise in Windows and...

  • Cloud SRE Engineer

    6 days ago


    Singapore OCBC Full time

    Join to apply for the Cloud SRE Engineer - Linux role at OCBC 2 days ago Be among the first 25 applicants Join to apply for the Cloud SRE Engineer - Linux role at OCBC Who We Are As Singapore’s longest established

  • Cloud SRE Engineer

    6 days ago


    Singapore OCBC Full time

    Join to apply for the Cloud SRE Engineer - Linux role at OCBC 2 days ago Be among the first 25 applicants Join to apply for the Cloud SRE Engineer - Linux role at OCBC Who We AreAs Singapore’s longest established