Site Reliability Specialist
6 months ago
Position Overview
The Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team.
He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation are on track. He/She will also assist the Reliability Principal and Engineering Teams in reviewing the reliability program to take stock of success and challenges and refine the program. He/She will be in charge of the management reports that describe the current situation and recommend the next steps.
As Lead of the Reliability team, which consists of experienced engineers and product specialists, he/she will be coaching the engineering teams and service management teams to help them improve in application reliability with tools, monitoring, prevention activities. He/She will collaborate with the applications, incident management (IOC) and infrastructure support teams to identify and implement procedures, tools and scripts that will improve reliability and reduce downtime while improving automation.
Role & Responsibilities
• Strive for automation either by coding it or by leading and influencing engineers to build systems that are easy to run in production
• Identify significant projects that result in substantial cost savings
• Identify changes for the production architecture from the reliability, performance and availability perspective with a data driven approach
• Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make operating cost cheaper to run for all our customers
• Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents
• Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives
• Know a domain really well and radiate that knowledge through recorded demos, discussions in DNA (Design and Automation) meetings, or Incident Reviews
• Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again
• Set an example for team of SREs with positive and inclusive leadership and discussion on work
• Show ownership of a major part of the infrastructure
• De-escalate any conflicts inside the team
Requirements
Bachelor’s degree in computer science or other highly technical, scientific discipline Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C#, and JavaScript Experience with infrastructure technologies like Operating Systems (Windows and Linux), networking, storage, virtualisation Familiar with testing automation tools Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Previous success in software engineering Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Have a sense of urgency to deliver & iterate fast A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Specialise in 1 or 2 of the following: Great software engineer and able to code in resolving defects or vulnerabilities of our systems Use infrastructure automation tools such as Chef or Ansible to efficiently manage our infrastructure Implement ""Infrastructure as Code"" using Terraform and CI/CD for automation Load balancing and high availability architecture of application including Proxies and CDN through the use of F5 Openshift and containerizing our system Administer and manage high-availability, high-performance Microsoft SQL Server or Oracle cluster Monitoring and Metrics in Dynatrace, ELK or eG and integrations with Dynatrace / ITSM Logging infrastructure Key, certificate and secrete management Backend storage management and scaling Disaster Recovery and High Availability strategyApply Now
Click Enter to update the description of Apply Now
NOTE: It only takes a few minutes to apply for a meaningful career in HealthTech - GO FOR IT
#LI-IHIS11
M-2022-2160
-
Site Reliability Engineer
1 month ago
Singapur, Singapore Sea Full timeJob Title: Site Reliability EngineerAt Sea, our Infrastructure team is responsible for providing end-to-end managed services and solutions for our entire Internet infrastructure. We excel in building architecture, providing solutions, and operating data centers, connectivity, cloud, networking, systems, storage, and security.As a Site Reliability Engineer,...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore Sea Full timeOur Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore Sea Full timeAbout Sea LabsAt Sea Labs, we're at the forefront of innovation, driving the development of cutting-edge technologies that power our e-commerce, supply chain, games, payment, and finance platforms. Our team in Indonesia is a key part of this journey, working closely with global teams to deliver exceptional user experiences.We're seeking a skilled Site...
-
Site Reliability Operations Engineer
4 weeks ago
Singapur, Singapore Sea Full timeAt Sea, our Infrastructure team provides end-to-end managed services and solutions for our entire Internet infrastructure, alongside running business applications. We excel in building architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. Our team is proud to provide high-quality and...
-
Senior Site Reliability Engineer
1 month ago
Singapur, Singapore Shopee Full timeAbout the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Engineering and Technology team in Singapore. As a key member of our team, you will be responsible for managing the technical operations of Shopee's core marketplace businesses, including product lines such as shopee voucher management, shopee discount/coins...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore Tencent Full timeJob Summary:Tencent Games is seeking a skilled Site Reliability Engineer to maintain the stability and performance of our overseas cloud platforms. As a key member of our team, you will be responsible for monitoring and resource management, ensuring the smooth operation of our data platforms and services.Key Responsibilities:Design and implement automatic...
-
Expert/Senior Site Reliability Engineer
2 months ago
Singapur, Singapore Sea Full timeOur Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...
-
Site Reliability Engineer
3 days ago
Singapur, Singapore Wibit Consulting & Services (WibitCS) Full timeIn Collaboration, we are building the backbone of reliable cloud solutions! Your Mission as a Site Reliability Engineer (SRE): Ensure the stability and performance of Yealink's overseas cloud operations. Tackle performance bottlenecks and implement creative solutions. ️ Master operational tasks like incident management, service requests, and system...
-
Senior Site Reliability Engineer
1 month ago
Singapur, Singapore Sea Full timeAbout Sea LabsAt Sea Labs, we're at the forefront of the Sea platform's development, supporting diverse business lines across e-commerce, supply chain, games, payment, and finance. Our strong growth and unique positioning have led to the launch of Sea Labs Indonesia, where passionate engineers drive the best experience for our users in Indonesia and...
-
Site Reliability Engineer II
1 month ago
Singapur, Singapore Ripple Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team in Singapore. As a key member of our infrastructure team, you will be responsible for ensuring the high availability and scalability of our systems.Key ResponsibilitiesDesign, implement, and maintain high availability systems and infrastructureCollaborate with...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore StarHub Full timeJob Description We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore StarHub Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at StarHub. As a Site Reliability Engineer, you will play a crucial role in designing, deploying, and managing scalable infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible, and GitHub.Key Responsibilities:Design and...
-
Staff Site Reliability Engineer, Platform
3 months ago
Singapur, Singapore GEMINI Full timeDepartment : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Platform focuses around building a scalable and secure foundations platform, enabling Engineering to deploy, validate,...
-
Site Reliability Engineer, VP
2 weeks ago
Singapur, Singapore Blackstone Full timeBlackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $ trillion in assets under management include...
-
Site Reliability Engineer
4 weeks ago
Singapur, Singapore DBS Bank Full timeJob SummaryDBS Bank is seeking a highly skilled Site Reliability Engineer to join our Consumer Banking Group Technology team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our production systems.Key ResponsibilitiesFacilitate and drive recovery calls for major incidents, coordinating with...
-
Staff Site Reliability Engineer, Platform
4 weeks ago
Singapur, Singapore GEMINI Full timeAbout the Role:As a Staff Site Reliability Engineer on Gemini's Platform team, you will play a crucial role in leading our engineering teams towards modern DevOps practices. You will develop and provide modern automation and operational tooling, and work cross-functionally across Gemini's engineering teams to influence and shape our development practices and...
-
Site Reliability Engineer Lead
1 month ago
Singapur, Singapore DBS Bank Full timeJob SummaryDBS Bank is seeking a highly skilled Site Reliability Engineer Lead to join our team. As a key member of our Technology and Operations group, you will be responsible for ensuring the operation stability and excellence within the unit.Key ResponsibilitiesEnsure the 24/7 operation teams are equipped with the right skillset and tools to manage...
-
Site Reliability Engineer II
6 months ago
Singapur, Singapore Ripple Full timeAt Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...
-
Electrical Reliability Engineer
1 month ago
Singapur, Singapore Celanese Corporation Full timeJob Summary:Celanese Corporation is seeking a highly skilled Electrical Reliability Engineer to join our team. As a key member of our electrical discipline, you will be responsible for enhancing electrical reliability and ensuring all KPIs are met.Key Responsibilities:Provide technical subject matter expertise to enhance electrical reliability and ensure all...
-
Site Reliability Engineer
6 months ago
Singapur, Singapore Helius Full time■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...