See more Collapse

SRE Manager, OLAP Engine

2 months ago


Singapur, Singapore TikTok Full time

About the team

TikTok and affiliate are developing the next-generation high-performance analytical database, with a mission to enable efficient and real-time data-driven decision-making on PB-level data sets. The initial product was forked from Clickhouse, after which large re-architecture had been taken place. The product now not only improves the efficiency of Clickhouse but also fits into the elastic cloud-native infrastructure with better scalability and resource utilization. With years of polishment in the internal EB-level scenarios, we are now ready to serve our business partners via various cloud vendors. Our software engineers for product infrastructure role combine software and systems engineering disciplines to run high-performance, large-scale distributed infrastructure. This means you will be deeply involved in the developmental lifecycle of critical software services, collaborating closely with product engineers to combine software code and systems knowledge to ensure that cloud-native OLAP engines are reliable, fault-tolerant, efficiently scalable and cost-effective. You will also be leveraging your software engineering expertise to develop software platforms and tools to optimise the operational and engineering efficiencies of complex systems at scale, with particular focus on improving the systems' observability, performance and maintainability. In this role, you will: - Building and managing the Global SRE team, including team recruitment, new talent training, system operation/maintenance/coordination and team culture building.- Improve the cross-team/time zone/regional cooperation mechanism, and provide SRE solutions in line with actual business scenarios based on business orientation.- Responsible for SRE team arrangement and project management, guiding basic SRE work to be more effective, and improving the overall SRE efficiency.- Develop process specifications and plans for compliant access, configuration, disaster recovery and fault handling of critical paths of overseas SRE services.- Responsible for continuously improving the core SRE capabilities of OLAP engine in efficiency, cost, quality, security, etc.- Develop automation, data visualization and automated monitoring processes to facilitate the optimization of the cloud-native OLAP engine infrastructure.- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.- Manage oncall processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

- Bachelor degree or above in Computer Science or a related technical discipline and good English communication skills.- Familiar with SRE-related processes, understand the development trend of SRE technology in the industry, and have a good ability to build an SRE system, 6 years+ SRE experience, big-data or OLAP engine SRE experience is best to have .- Familiar with SRE technologies, including Kubernetes, Terraform, Ansible, Bash Scripting etc.- Familiar with cloud computing technologies of Amazon Web Services, Google Cloud Platform and other suppliers.- Expertise in operations, deployment, and trouble shooting high availability and quality assurance of large-scale distributed systems, with a strong focus on stability and performance.- Possesses a strong sense of responsibility, a proactive team spirit, and a strong ability to comprehensively analyze and solve problems.

We have other current jobs related to this field that you can find below


  • Singapur, Singapore TikTok Full time

    Team Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...

  • Machine Learning Ops

    2 months ago


    Singapur, Singapore TikTok Full time

    Team Introduction MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities - Responsible for...

  • Machine Learning Ops

    4 weeks ago


    Singapur, Singapore TikTok Full time

    Team Introduction MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities - Responsible for...


  • Singapur, Singapore TikTok Full time

    Team Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...


  • Singapur, Singapore DBS Bank Full time

    Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...


  • Singapur, Singapore DBS Bank Full time

    Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...


  • Singapur, Singapore DBS Bank Full time

    Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...


  • Singapur, Singapore DBS Bank Full time

    Head of Business Architecture and SRE, Technology & Operations-(23000098) MD, Head of Business Technology, Enterprise Architecture & Site Reliability Engineering, Technology & Operations  Department description  The Enterprise Architecture and Site Reliability Engineering (EASRE) division is a team of architect, engineer, SRE practitioners, program...


  • Singapur, Singapore DBS Bank Full time

    Head of Business Architecture and SRE, Technology & Operations-(23000098) MD, Head of Business Technology, Enterprise Architecture & Site Reliability Engineering, Technology & Operations  Department description  The Enterprise Architecture and Site Reliability Engineering (EASRE) division is a team of architect, engineer, SRE practitioners, program...


  • Singapur, Singapore NTT Full time

    JOB DESCRIPTION NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future. In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help...


  • Singapur, Singapore NTT DATA Full time

    Job Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...


  • Singapur, Singapore NTT Full time

    JOB DESCRIPTION NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future. In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help...


  • Singapur, Singapore NTT DATA Full time

    Job Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...

  • Tech Lead

    4 weeks ago


    Singapur, Singapore TikTok Full time

    Team Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...


  • Singapur, Singapore DBS Bank Full time

    Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...


  • Singapur, Singapore DBS Bank Full time

    Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...


  • Singapur, Singapore TIKTOK PTE. LTD. Full time

    TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly...

  • Machine Learning Ops

    3 weeks ago


    Singapur, Singapore TIKTOK PTE. LTD. Full time

    TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly...


  • Singapur, Singapore CITADEL ENTERPRISE (SINGAPORE) PTE. LIMITED Full time

    Roles & ResponsibilitiesCitadel’s Site Reliability Engineers (SRE) work to bring their practices to the financial trading field by bringing innovation and cutting-edge technology to reduce complexity and improve performance. SREs are responsible for taking applications to production, providing early support for applications in development, and ensuring...

  • Resident Engineer

    2 weeks ago


    Singapur, Singapore Tritech Group Limited Full time

    The SRE/RE shall be an engineer in the civil/structural discipline with educational qualifications recognized by the Professional Engineers Board of Singapore. With minimum ten (10/5) years of relevant experience on projects similar in nature to works in MRT works including at least five (5/3) years in the design ofgeotechnical works for deep...