Lead Site Reliability Engineer, Cloud Technology

3 days ago


Singapore JPMorganChase Full time

Public Cloud SRE is responsible for engineering and operating the cloud infrastructure and platforms of JPMC ensuring reliability, resiliency, and security. We have a Senior Software Engineer, Site Reliability position to build the infrastructure and tooling for JPMC’s Public Cloud Platform.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Cloud Reliability Services, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

**Job responsibilities**
- Engage in and improve the lifecycle of cloud services from inception, design, deployment, and operation
- Automate repeated manual tasks, develop tools and automation to improve the efficiency of the platform and infrastructure.
- Analyze defects, propose improvements and drive efficiencies in systems and processes.
- Helps to develop new cloud engineering strategies and implementations for the firm
- As part of Site Reliability, you have the responsibility of ensuring the reliability, availability, and performance of the cloud infrastructure and platform.
- Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team
- Develop observability and telemetry tools.
- Author and improve the quality of technical engineering documentation
- Debug and solve issues in a production environmentParticipates in SRE on-call rotations and escalation workflows.

**Required qualifications, capabilities, and skills**
- Formal training or certification on software engineering or site reliability engineering and 5+ years applied experience
- Bachelor’s Degree in Computer Science or equivalent
- Expertise in building solutions with AWS cloud services.
- Knowledge in Infrastructure as Code, tools such as Terraform
- Fluency in at least one programming language such as Python and Java.
- Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
- Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
- Experience with troubleshooting common networking technologies and issues
- Ability to identify and solve problems related to complex data structures and algorithms
- Drive to self-educate and evaluate new technology
- Ability to teach new programming languages to team members
- Ability to expand and collaborate across different levels and stakeholder groups
- Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
- Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive

**Preferred qualifications, capabilities, and skills**
- AWS certifications will be a bonus.



  • Singapore Solace Corporation Full time

    A leading technology company in Singapore is seeking a Cloud Site Reliability Engineer responsible for the daily operations of their market-leading SaaS offering. You will ensure the health and reliability of cloud services, improve infrastructure tooling, and engage directly with customers to resolve issues. The ideal candidate will have hands-on experience...


  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore DORMAKABA PRODUCTION GMBH & CO. KG. Full time

    Site Reliability Engineer is responsible for keeping all Cloud Platform Services and Solutions (CPSS) services and other cloud solutions running smoothly. You will be a key contributor on a dynamic team, expand your skillset and become an expert in the most popular cloud software development strategies for dormakaba. We are looking for an independent,...


  • Singapore ABAXX SINGAPORE PTE. LTD. Full time

    Site Reliability Engineer - Networking We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house. This role is ideal for someone with a strong foundation in AWS services, infrastructure as code, and cloud security, who is passionate about building scalable, secure,...

  • Site Reliability

    1 week ago


    Singapore Canonical Full time

    Join to apply for the Site Reliability / Gitops Engineer role at Canonical 1 day ago Be among the first 25 applicants Join to apply for the Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely...


  • Singapore ETEAM WORKFORCE PTE. LTD. Full time

    Position: Site Reliability Engineer (SRE) Work Mode - Onsite/Hybrid Timing - 9am to 6 pm Duration – 1 Year (Highly extendable) Salary: 6018 SGD Work Location: Robinson Road, Singapore About the Role We are looking for a seasoned Site Reliability Engineer (SRE) with 5+ years of experience to join our Platform Engineering team. This role is ideal for someone...


  • Singapore Second Talent Full time

    Infrastructure Platform Development Design, build, and enhance infrastructure operation platforms Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging Drive platform standardization and automation initiatives High Availability & Reliability Ensure maximum uptime for production services...


  • Singapore EC1 Partners Full time

    Overview EC1 Partners is working with a leading global eFX trading platform that is expanding its technology presence in Singapore. We are seeking an experienced Site Reliability Engineer (SRE) to join their team. This is a full-time, permanent role offering the opportunity to work in a fast-paced environment where scale, performance, and reliability are...


  • Singapore TP-LINK CORPORATION PTE. LTD. Full time

    Responsibilities Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms. Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform. Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices. Build Observability for...


  • Singapore Qlik Full time

    **What makes us Qlik?** A Gartner® Magic Quadrant Leader for 14 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel in...