Site Reliability Engineer

1 month ago


Singapur, Singapore Sea Full time

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

About Team

The mission of SRE (Site Reliability Engineer) team is to ensure the efficient and sustainable operation of Shopee 24x7, as well as to build and maintain large-scale, highly available, high-performance distributed systems based on system availability and performance. It is formed by combining traditional software engineering and technical operation. The SRE team needs to dive deep into the Shopee development lines to ensure that the system is highly scalable under rapid evolution of the System. From the perspective of stability and performance, it includes the design of business development, components of the basic platform (middleware, container scheduling, caching, object storage, etc.), OS optimization, data center and network optimisation. We optimise the inefficient and complicated operation in the traditional operation and maintenance mode through engineering and service means, and are committed to building a sound monitoring system to improve the efficiency of incident handling.


Job Description
Deep dive into development lines, learn and understand the mechanism of every application component, and promote product scalability, stability, and performanceSet up, manage, and maintain Shopee product/middleware/big-data applications and servicesPerform regular and ad-hoc server-side deployments, make improvements of the performance, and troubleshootDesign and develop automated technical operation platformManage Capacity and ResourceResponsible for the full-chain stress test to enhance the performance and remove redundancy of applicationsPrepare routine operation documentation
Bachelor's degree or above in Computer Science, Engineering, Information Systems or related fieldsMore than 2 years of relevant experience (candidates with no working experience are welcomed to apply)Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.)Highly familiar with Computer Network (TCP/IP, DNS, etc.), Computer Organisations, and OSHands-on experience with at least one of the programming languages: Bash, Python, GoStrong analytical and problem-solving skills with the ability to thrive in a dynamic work environmentPassionate and possess a strong sense of responsibilityFast learning ability and a good team playerAgile and detail-oriented

Skills below are optional but preferred:

Experience with automation tools like Ansible, SaltStackExperience with monitoring tools like Prometheus, Zabbix, Grafana etc Experience with load balancing tools like LVS, Nginx, Openresty or HAProxy Experience with container technology such as Docker, KubernetesExperience with High Availability system design and Server Deployment ProcessExperience with SREExperience with Ops Paas platform or Ops automation platform (ie:CMDB)

  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik?AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, excel in...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Platform focuses around building a scalable and secure foundations platform, enabling Engineering to deploy, validate,...


  • Singapur, Singapore IHiS Full time

    Position OverviewThe Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team. He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation...


  • Singapur, Singapore StarHub Full time

    Job Description We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on...


  • Singapur, Singapore Wibit Consulting & Services (WibitCS) Full time

    In Collaboration, we are building the backbone of reliable cloud solutions! Your Mission as a Site Reliability Engineer (SRE): Ensure the stability and performance of Yealink's overseas cloud operations. Tackle performance bottlenecks and implement creative solutions. ️ Master operational tasks like incident management, service requests, and system...


  • Singapur, Singapore Blackstone Full time

    Blackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $ trillion in assets under management include...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik? AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...


  • Singapur, Singapore Shopee Full time

    Senior Site Reliability Engineer (Promotion) - Engineering Infra DepartmentEngineering and TechnologyLevelExperienced (Individual Contributor)LocationSingapore The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best...


  • Singapur, Singapore Vortexa Full time

    Vortexa was founded to solve the immense information gap that exists in the energy industry. By using massive amounts of new satellite data and pioneering work in artificial intelligence, Vortexa creates an unprecedented view on the global seaborne energy flows in real-time, bringing transparency and efficiency to the energy markets and society as a...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: Job Description - Senior Reliability Engineer (Electrical) / Electrical - Subject Matter Expert Electrical Reliability and Maintenance: -Provide technical subject matter expertise to enhance the electrical reliability and ensuring all KPIs are met. -Improve reliability of electrical equipment by implementing repair...


  • Singapur, Singapore TikTok Full time

    About the team Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs...


  • Singapur, Singapore Garena Full time

    Job Description Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance. Setup, manage and maintain product, middleware, big-data applications and services. Perform regular and ad-hoc server-side deployments, performance fine-tuning and...

  • Reliability Engineer

    4 months ago


    Singapur, Singapore Broadcom Inc. Full time

    Please Note : 1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account) 2. If you already have a Candidate Account, please Sign-In before you apply. Job Description: As part of the quality and reliability team you will be responsible for setting and managing...


  • Singapur, Singapore Sea Full time

    About Sea LabsSea Labs is at the core of the Sea platform development, supporting diverse business lines from e-commerce, supply chain, games, payment, and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, passionate engineers have charted...


  • Singapur, Singapore Salve.Inno Consulting Full time

    In this role, you will ensure the stability and performance of the platform, proactively address incidents, and continuously improve operational efficiency. You’ll work closely with a dynamic team and report to the Operations and Development Supervisor in China.Key Responsibilities Oversee overseas cloud operations, maintaining high platform stability and...


  • Singapur, Singapore Helius Full time

    ■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: • Responsible for improving the Reliability of Static Equipment by developing and implementing effective Reliability Strategies.• Provide technical subject matter expertise for Static Equipment based on Engineering Codes, SEPs and RAGAGEPS.• Identify and eliminate recurring problems and bad actors by analyzing failure...


  • Singapur, Singapore United Overseas Bank Full time

    AVP Site Reliability Engineer, Group Infrastructure Platform Services Posting Date: 21-May-2023 Location: Singapore, Singapore Company: United Overseas Bank Ltd About UOB United Overseas Bank Limited (UOB) is a leading bank in Asia with a global network of more than 500 branches and offices in 19 countries and territories in Asia Pacific,...