Senior Site Reliability Engineer

3 weeks ago


Singapur, Singapore TIKTOK PTE. LTD. Full time
About Tiktok

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul, and Tokyo.

Why Join Us

Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.

To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.

At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.

Join us.

About the Team

Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs across domains within TikTok. We look forward to welcoming you to the team.

Responsibilities:Lead a global SRE team for TikTok's Data Platform, distributed across the US and Singapore. Responsible for the reliability of all TikTok's major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, Doris, etc.Uphold Service Level Agreements (SLAs): Ensure that all service level objectives and agreements from ByteDance's Data Platform services are met. Lead team members to respond promptly to any system outages or issues.Continuous Performance Optimization: Lead the team to deeply analyze service performance and reliability patterns to identify potential performance bottlenecks. Implement proactive measures to prevent service disruptions. Work with development teams to optimize application performance, ensuring that services run efficiently and that resources are utilized effectively.Incident Management: Build robust incident management mechanism. Lead efforts to troubleshoot and resolve service incidents and postmortems. Coordinate with cross-functional teams to manage and mitigate service-impacting events.Infrastructure Automation: Lead the team to develop highly efficient toolchains covering end-to-end deployment and reliability assurance operations. Automate infrastructure provisioning, scaling, and management processes to reduce manual interventions and improve service quality. Develop and enhance system capabilities such as auto-failure-detection, auto-healing, chaotic engineering, and perform systematic disaster drills.Collaboration: Engage with product and development teams to integrate reliability and performance considerations into the software lifecycle.Capacity and Demand Planning: Assess and forecast infrastructure needs based on growth patterns and upcoming initiatives.Stay Updated: Keep current with industry trends, best practices, and emerging technologies related to site reliability and infrastructure engineering.

Minimum Qualifications:Bachelor's Degree or above, in Computer Science, Engineering, or a related field. Passionate about computer science and Internet technology.5+ of experience in the SRE domain. 2+ years of experience in team management.5+ years experience and in-depth understanding of Linux, computer networking, and databases. Proficient in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes.5 years experience or familiarity with open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto and Kubernetes.5 years+ experience in coding in at least one scripting or programming language, including but not limited to Python, Shell, Java, Go, etc.

Preferred Qualifications:Excellent problem-solving skills and the ability to think critically under pressure. Start with the end state in mind, and be willing to take a moonshot.Strong written and verbal communication skills, with great customer-first mind set. Strong sense of ownership and easy to collaborate with.Able to collaborate effectively with partners and team members across time zones in different countries.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering...


  • Singapur, Singapore Renesas Electronics Full time

    Job DescriptionOverviewWe are seeking a skilled and experienced Site Reliability Engineer to join our team. In this role, you will be part of the AI & Cloud Engineering (ACE) Division and AI Workbench team. Our AI Workbench is a cloud-based environment to accelerate Automotive AI Software Development and Evaluation. The AI Workbench has 4 main functional...


  • Singapur, Singapore Encora Inc. Full time

    Site Reliability Engineer Location: Singapore Experience: 5 years Job Mode: Full-time  Work Mode: On-site The Site Reliability Engineer/Software Engineer is a contract position responsible software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. As a SRE you will help to ensure that our services are reliable,...


  • Singapur, Singapore IHiS Full time

    Position OverviewThe Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team. He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation...


  • Singapur, Singapore IHiS Full time

    Position OverviewThe Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team. He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation...


  • Singapur, Singapore Shopee Full time

    Senior Site Reliability Engineer (Promotion) - Engineering Infra DepartmentEngineering and TechnologyLevelExperienced (Individual Contributor)LocationSingapore The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best...


  • Singapur, Singapore NTT DATA Full time

    Job Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...

  • Reliability Engineer

    4 weeks ago


    Singapur, Singapore Pfizer Full time

    Pfizer Singapore is recruiting permanent employees for manufacturing site expansion of Pfizer Asia Manufacturing Pte Ltd (PAMPL) in Singapore Why Patients Need You Whether you are involved in the design and development of manufacturing processes for products or supporting maintenance and reliability, engineering is vital to making sure customers and...


  • Singapur, Singapore Sea Full time

    About Sea Labs IndonesiaSea Labs is at the core of the Sea platforms development, supporting diverse business lines from e-commerce, supply chain, games, payment and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, the group of passionate...


  • Singapur, Singapore NTT Full time

    JOB DESCRIPTION NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future. In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help...


  • Singapur, Singapore CITADEL ENTERPRISE (SINGAPORE) PTE. LIMITED Full time

    Roles & ResponsibilitiesCitadel’s Site Reliability Engineers (SRE) work to bring their practices to the financial trading field by bringing innovation and cutting-edge technology to reduce complexity and improve performance. SREs are responsible for taking applications to production, providing early support for applications in development, and ensuring...


  • Singapur, Singapore Tower Research Capital Full time

    Responsibilities Overseeing and ensure the continuous operation of the firm's Linux-based trading infrastructure, addressing day-to-day operational needs Providing second-level support, including: Driving the development of automated solutions for server provisioning, configuration, and monitoring, targeting a scalable management of thousands of servers ...


  • Singapur, Singapore Tower Research Capital Full time

    Responsibilities Overseeing and ensure the continuous operation of the firm's Linux-based trading infrastructure, addressing day-to-day operational needs Providing second-level support, including: Driving the development of automated solutions for server provisioning, configuration, and monitoring, targeting a scalable management of thousands of servers ...

  • Reliability Intern

    2 weeks ago


    Singapur, Singapore Takeda Full time

    DescriptionScope of Internship:The manufacturing site in Woodlands is a crucial hub in Takeda's Global Manufacturing and Supply network, focusing on agility, connectivity, performance, innovation, and people-centric values to enhance patient care. As a Reliability Engineering Intern, you will collaborate with key Takeda stakeholders to fulfill reliability...


  • Singapur, Singapore U3 Full time

    Job Opening: Operation Technician for Fresh Engineering Graduates Location: Tuas Support plant goals and objectives to achieve overall site KPIs. Adhere to safety guidelines, SOPs, policies, and standards. Integrate safety practices across all work areas and contribute safety suggestions and enhancements. Promptly report any unsafe activities or...


  • Singapur, Singapore IO TECH SOLUTIONS LIMITED Full time

    We are looking for a skilled Site Reliability Engineer to join our client's global SRE Team in Singapore. Responsibilities: Overseeing and ensuring the continuous operation of the firm's Linux based trading infrastructure, addressing day to day operational needs Providing second level support, including:Rapid response to emergenciesImplementing scheduled...


  • Singapur, Singapore TikTok Full time

    About the team Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs...