Site Reliability Specialist

7 months ago


Singapur, Singapore IHiS Full time

Position Overview

The Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team.

He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation are on track. He/She will also assist the Reliability Principal and Engineering Teams in reviewing the reliability program to take stock of success and challenges and refine the program. He/She will be in charge of the management reports that describe the current situation and recommend the next steps.

As Lead of the Reliability team, which consists of experienced engineers and product specialists, he/she will be coaching the engineering teams and service management teams to help them improve in application reliability with tools, monitoring, prevention activities. He/She will collaborate with the applications, incident management (IOC) and infrastructure support teams to identify and implement procedures, tools and scripts that will improve reliability and reduce downtime while improving automation.

Role & Responsibilities

• Strive for automation either by coding it or by leading and influencing engineers to build systems that are easy to run in production

• Identify significant projects that result in substantial cost savings

• Identify changes for the production architecture from the reliability, performance and availability perspective with a data driven approach

• Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make operating cost cheaper to run for all our customers

• Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents

• Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives

• Know a domain really well and radiate that knowledge through recorded demos, discussions in DNA (Design and Automation) meetings, or Incident Reviews

• Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again

• Set an example for team of SREs with positive and inclusive leadership and discussion on work

• Show ownership of a major part of the infrastructure

• De-escalate any conflicts inside the team

Requirements

Bachelor’s degree in computer science or other highly technical, scientific discipline Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C#, and JavaScript Experience with infrastructure technologies like Operating Systems (Windows and Linux), networking, storage, virtualisation Familiar with testing automation tools Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Previous success in software engineering Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Specialise in 1 or 2 of the following: Great software engineer and able to code in resolving defects or vulnerabilities of our systems Use infrastructure automation tools such as Chef or Ansible to efficiently manage our infrastructure Implement ""Infrastructure as Code"" using Terraform and CI/CD for automation Load balancing and high availability architecture of application including Proxies and CDN through the use of F5 Openshift and containerizing our system Administer and manage high-availability, high-performance Microsoft SQL Server or Oracle cluster Monitoring and Metrics in Dynatrace, ELK or eG and integrations with Dynatrace / ITSM Logging infrastructure Key, certificate and secrete management Backend storage management and scaling Disaster Recovery and High Availability strategy

Apply Now

Click Enter to update the description of Apply Now
NOTE: It only takes a few minutes to apply for a meaningful career in HealthTech - GO FOR IT

#LI-IHIS11

M-2022-2160



  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik?AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, excel in...


  • Singapur, Singapore StarHub Full time

    Job Description We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Platform focuses around building a scalable and secure foundations platform, enabling Engineering to deploy, validate,...


  • Singapur, Singapore Blackstone Full time

    Blackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $ trillion in assets under management include...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik? AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: Job Description - Senior Reliability Engineer (Electrical) / Electrical - Subject Matter Expert Electrical Reliability and Maintenance: -Provide technical subject matter expertise to enhance the electrical reliability and ensuring all KPIs are met. -Improve reliability of electrical equipment by implementing repair...


  • Singapur, Singapore Helius Full time

    ■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...


  • Singapur, Singapore TikTok Full time

    About the team Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs...


  • Singapur, Singapore Shopee Full time

    Senior Site Reliability Engineer (Promotion) - Engineering Infra DepartmentEngineering and TechnologyLevelExperienced (Individual Contributor)LocationSingapore The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best...


  • Singapur, Singapore Garena Full time

    Job Description Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance. Setup, manage and maintain product, middleware, big-data applications and services. Perform regular and ad-hoc server-side deployments, performance fine-tuning and...


  • Singapur, Singapore Sea Full time

    About Sea LabsSea Labs is at the core of the Sea platform development, supporting diverse business lines from e-commerce, supply chain, games, payment, and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, passionate engineers have charted...


  • Singapur, Singapore Vortexa Full time

    Vortexa was founded to solve the immense information gap that exists in the energy industry. By using massive amounts of new satellite data and pioneering work in artificial intelligence, Vortexa creates an unprecedented view on the global seaborne energy flows in real-time, bringing transparency and efficiency to the energy markets and society as a...


  • Singapur, Singapore United Overseas Bank Full time

    AVP Site Reliability Engineer, Group Infrastructure Platform Services Posting Date: 21-May-2023 Location: Singapore, Singapore Company: United Overseas Bank Ltd About UOB United Overseas Bank Limited (UOB) is a leading bank in Asia with a global network of more than 500 branches and offices in 19 countries and territories in Asia Pacific,...


  • Singapur, Singapore PSI CRO Full time

    Job DescriptionAs a Site Technology Specialist, you will work with clinical sites and provide technical support and expertise related to technology (Kidney imaging, cell therapy, radiology and renal ultrasounds).You will:Provide clinical sites with technical support and expertise related to technology (Nuclear medicine, kidney imaging, interventional...


  • Singapur, Singapore Sea Full time

    About Sea LabsSea Labs is at the core of the Sea platforms development, supporting diverse business lines from e-commerce, supply chain, games, payment and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, the group of passionate engineers...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: • Responsible for improving the Reliability of Static Equipment by developing and implementing effective Reliability Strategies.• Provide technical subject matter expertise for Static Equipment based on Engineering Codes, SEPs and RAGAGEPS.• Identify and eliminate recurring problems and bad actors by analyzing failure...

  • Senior Manger

    4 months ago


    Singapur, Singapore StarHub Full time

    Job Description The Senior Manager, Site Reliability Engineering (SRE) operations Analyst is expected to effectively incident retrospective operations and in other SRE activities in general which pertains to maintenance management that includes availability, latency, performance, change management, monitoring, capacity planning & also the solutions...


  • Singapur, Singapore Tencent Full time

    Responsibilities: About the Company Tencent is a leading global technology company focused on connecting people and developing innovative products and services that improve the quality of life of people around the world. Founded in 1998 and publicly traded on the Hong Kong Stock Exchange since 2004, Tencent offers a variety of products and services,...