
Lead Service Reliability Engineer
2 weeks ago
**Job Summary**:
**Salary**
S$9,000 - S$16,500 / Monthly
**Job Type**
**Seniority**
Lead
**Years of Experience**
At least 8 years
**Tech Stacks**
Strategy Zipkin GitLab CircleCI AWS Terraform Docker Jenkins Go Docker Swarm Shell Script Jaeger Swarm CI ELK EKS Shell Java Grafana Prometheus Kubernetes Ansible Ruby Python
As a Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.
**Job responsibilities**:
- You will be responsible for understanding requirements or SRE goals in depth from both tech and business perspectives
- You will provide solutions to improve reliability, including identifying and implementing mechanisms and architectures that enable fault tolerance and faster median time to respond and median time to detect
- You will be responsible for enhancing the incident management process, including the development of an incident prioritization matrix, triage, communication, mitigation, post-mortem analysis and implementation of corrective actions
- You will manage client stakeholder expectations and queries during production incidents, providing detailed technical analysis of issues and remediation plans for mitigation and prevention in future, and act as the interface for C-level executives, if or when needed
- You will be a liaison with client engineering teams, build trust and productive relationships with senior client stakeholders and team leads to influence them in making better decisions
- You will be responsible for identifying opportunities for enhancing system performance and reliability in alignment with business SLAs, SLOs, KPIs and objectives, and provide guidance and assistance to SRE teams in implementing the identified improvements
- You will oversee and mentor other SREs on the team, contributing to their growth and development
**Job qualifications**:
Technical skills
- You can program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby or Java
- You are familiar with DevOps and GitOps practices, driving the integration of observability automation into CI/CD pipelines, e.g.: GitLab, Jenkins, CircleCI or equivalent
- You have in-depth knowledge of configuration management and Infrastructure as Code (IAC) tools such as Terraform, Ansible, ARM and CloudFormation for provisioning and managing infrastructure
- You have an expertise in observability, logs, tracing and monitoring tools such as Grafana (Loki and Tempo), Prometheus, Graylog, Jaeger, Zipkin, ELK stack or equivalent
- You have a strong understanding of container-based architecture and hands-on experience with orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.
- You have a good understanding of essential concepts such as quality gates encompassing SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortem methodologies, synthetic monitoring, distributed tracing, end-user monitoring and performance testing
- You have experience with network load balancing, security tech stacks, Transport Layer Security (TLS) and certificate management, and an understanding of standard networking protocols and configurations
Professional skills
- You have strong communication and articulation skills, and are proficient in English
- You are able to convey resolutions to audiences with varying degrees of technical/business proficiency and bring them to consensus
- You have excellent problem-solving and analytical skills, with a focus on continuous improvement
- You have good listening and presentation skills
- You solve challenging problems and difficult to debug issues with a never give up attitude
- You can collaborate with cross-functional engineering teams to conduct capacity planning and scalability assessments, and design solutions for handling current and future growth
- You have the ability to work under pressure, with composure, during production incidents
- You understand requirements provided by the client on both technical and business aspects, and can break them down for successful implementation
- You’re willing to be part of a rotation
- and need-based, 24x7 available team.
**Other things to know**:
**Learning and development**:
There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This m
-
Lead Reliability Engineer
2 days ago
Singapore Smiths Group Full timeLocation: Asia Pacific, Singapore, Singapore- Ref: JOHNCRANEAPAC01493- Division: John Crane- Job Function: Operations**Job Description**: The Lead Reliability Engineer will ensure effective and efficient service contract operation, principally through providing engineering and reliability support with the objective of improving overall equipment reliability,...
-
Lead Reliability Engineer
2 days ago
Singapore John Crane Full timeThe Lead Reliability Engineer will ensure effective and efficient service contract operation, principally through providing engineering and reliability support with the objective of improving overall equipment reliability, availability and capability. The person is responsible for managing and driving all Performance Plus contract related tasks and day to...
-
Reliability Engineer, Engineering Services
7 days ago
Singapore NodeFlair Full time**Job Summary**: **Salary** S$3,500 - S$6,800 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 3 years **Tech Stacks** MODE **Purpose and Scope** The Reliability Engineer plays a crucial role in ensuring the optimal performance, availability, and lifespan of assets. The purpose of this role is to develop and implement...
-
Reliability Engineer
2 weeks ago
Singapore NE Digital Full timeCOMPANY DESCRIPTION NE Digital is the digital, data and technology organization that serve as a center of excellence to drive digital transformation for our group of NTUC Social Enterprises to meet the critical social needs of Singapore's community. Delivering innovative products and solutions, we empower our people to lead a better and meaningful life...
-
Reliability Engineer
2 weeks ago
Singapore Chevron Full time**Responsibilities for this position may include but are not limited to**: - Facilitates & stewards the roll out of global reliability initiatives, such as Facility Integrity & Reliability Management (FIRM), within SMP by engaging cross functional stakeholder - Responsible for plant reliability KPI tracking & reporting - Leads the assessment of Asset...
-
Reliability Engineer
2 weeks ago
Singapore Chevron Full time**Responsibilities for this position may include but are not limited to**: - Facilitates & stewards the roll out of global reliability initiatives, such as Facility Integrity & Reliability Management (FIRM), within SMP by engaging cross functional stakeholder - Responsible for plant reliability KPI tracking & reporting - Leads the assessment of Asset...
-
Reliability Engineer
7 days ago
Singapore Cognizant Full time**About the role** The Reliability Engineer ensures stability of the manufacturing plant, systems health, lifecycle management, user satisfaction. Prioritizing digital capabilities and infrastructure's reliability, performance, and efficiency is a must. All employees involved in the development and maintenance of these services must work collaboratively to...
-
Singapore JPMorganChase Full timeAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Electronic Trading Service, you hold a leadership role in your team, demonstrate strong knowledge across multiple...
-
Reliability Engineer
2 days ago
Singapore Annexion Partners Pte Ltd Full timeLocation: - Singapore- Discipline: - Client type: - Contact: - Ethan Tan- Reference: - 868- Posted: - about 1 hour agoWe are currently looking for a Reliability Engineer for a leading Data Centre Operator in the region, who will bring onboard with him/her knowledge on the DC market in Singapore to add value to the team. He/She will be able to work with a...
-
MTS Reliability engineer
3 weeks ago
Singapore GlobalFoundries Full timeJoin to apply for the MTS Reliability engineer role at GlobalFoundries 2 months ago Be among the first 25 applicants Join to apply for the MTS Reliability engineer role at GlobalFoundries Get AI-powered advice on this job and more exclusive features. About GlobalFoundriesGlobalFoundries is a leading full-service semiconductor foundry providing a unique...