Site Reliability Specialist

4 weeks ago


Singapur, Singapore IHiS Full time

Position Overview

The Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team.

He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation are on track. He/She will also assist the Reliability Principal and Engineering Teams in reviewing the reliability program to take stock of success and challenges and refine the program. He/She will be in charge of the management reports that describe the current situation and recommend the next steps.

As Lead of the Reliability team, which consists of experienced engineers and product specialists, he/she will be coaching the engineering teams and service management teams to help them improve in application reliability with tools, monitoring, prevention activities. He/She will collaborate with the applications, incident management (IOC) and infrastructure support teams to identify and implement procedures, tools and scripts that will improve reliability and reduce downtime while improving automation.

Role & Responsibilities

• Strive for automation either by coding it or by leading and influencing engineers to build systems that are easy to run in production

• Identify significant projects that result in substantial cost savings

• Identify changes for the production architecture from the reliability, performance and availability perspective with a data driven approach

• Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make operating cost cheaper to run for all our customers

• Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents

• Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives

• Know a domain really well and radiate that knowledge through recorded demos, discussions in DNA (Design and Automation) meetings, or Incident Reviews

• Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again

• Set an example for team of SREs with positive and inclusive leadership and discussion on work

• Show ownership of a major part of the infrastructure

• De-escalate any conflicts inside the team

Requirements

Bachelor’s degree in computer science or other highly technical, scientific discipline Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C#, and JavaScript Experience with infrastructure technologies like Operating Systems (Windows and Linux), networking, storage, virtualisation Familiar with testing automation tools Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Previous success in software engineering Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Specialise in 1 or 2 of the following: Great software engineer and able to code in resolving defects or vulnerabilities of our systems Use infrastructure automation tools such as Chef or Ansible to efficiently manage our infrastructure Implement ""Infrastructure as Code"" using Terraform and CI/CD for automation Load balancing and high availability architecture of application including Proxies and CDN through the use of F5 Openshift and containerizing our system Administer and manage high-availability, high-performance Microsoft SQL Server or Oracle cluster Monitoring and Metrics in Dynatrace, ELK or eG and integrations with Dynatrace / ITSM Logging infrastructure Key, certificate and secrete management Backend storage management and scaling Disaster Recovery and High Availability strategy

Apply Now

Click Enter to update the description of Apply Now
NOTE: It only takes a few minutes to apply for a meaningful career in HealthTech - GO FOR IT

#LI-IHIS11

M-2022-2160



  • Singapur, Singapore IHiS Full time

    Position OverviewThe Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team. He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation...


  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Renesas Electronics Full time

    Job DescriptionOverviewWe are seeking a skilled and experienced Site Reliability Engineer to join our team. In this role, you will be part of the AI & Cloud Engineering (ACE) Division and AI Workbench team. Our AI Workbench is a cloud-based environment to accelerate Automotive AI Software Development and Evaluation. The AI Workbench has 4 main functional...


  • Singapur, Singapore Encora Inc. Full time

    Site Reliability Engineer Location: Singapore Experience: 5 years Job Mode: Full-time  Work Mode: On-site The Site Reliability Engineer/Software Engineer is a contract position responsible software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. As a SRE you will help to ensure that our services are reliable,...


  • Singapur, Singapore Encora Inc. Full time

    Site Reliability Engineer Location: Singapore Experience: 5 years Job Mode: Full-time  Work Mode: On-site The Site Reliability Engineer/Software Engineer is a contract position responsible software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. As a SRE you will help to ensure that our services are reliable,...


  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering...


  • Singapur, Singapore Flowserve Full time

    Flowserve is a world-leading manufacturer and aftermarket service provider of comprehensive flow control systems. Driven by our Purpose, we are committed to building a more sustainable future to make the world better for everyone. With more than 16,000 employees in more than 50 countries, we combine our global reach with local presence. We support more than...

  • Reliability Intern

    2 weeks ago


    Singapur, Singapore Takeda Full time

    DescriptionScope of Internship:The manufacturing site in Woodlands is a crucial hub in Takeda's Global Manufacturing and Supply network, focusing on agility, connectivity, performance, innovation, and people-centric values to enhance patient care. As a Reliability Engineering Intern, you will collaborate with key Takeda stakeholders to fulfill reliability...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...


  • Singapur, Singapore NTT DATA Full time

    Job Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...


  • Singapur, Singapore NTT DATA Full time

    Job Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...


  • Singapur, Singapore Helius Full time

    ■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...


  • Singapur, Singapore Helius Full time

    ■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...


  • Singapur, Singapore Sea Full time

    About Sea Labs IndonesiaSea Labs is at the core of the Sea platforms development, supporting diverse business lines from e-commerce, supply chain, games, payment and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, the group of passionate...

  • Site Specialist II

    16 hours ago


    Singapur, Singapore Thermo Fisher Scientific Full time

    :  The candidatewill be based on customer’s site(s) and encouragedtoEnsure customer’s consumables and chemicals inventory under the Inventory Management (IM) program are sufficiently stocked to their needs. They would be encouraged to bescientifically and technically inclined. Essential Functions: Inventory Management - Handles delivery of...