SRE Manager, OLAP Engine
2 months ago
About the team
TikTok and affiliate are developing the next-generation high-performance analytical database, with a mission to enable efficient and real-time data-driven decision-making on PB-level data sets. The initial product was forked from Clickhouse, after which large re-architecture had been taken place. The product now not only improves the efficiency of Clickhouse but also fits into the elastic cloud-native infrastructure with better scalability and resource utilization. With years of polishment in the internal EB-level scenarios, we are now ready to serve our business partners via various cloud vendors. Our software engineers for product infrastructure role combine software and systems engineering disciplines to run high-performance, large-scale distributed infrastructure. This means you will be deeply involved in the developmental lifecycle of critical software services, collaborating closely with product engineers to combine software code and systems knowledge to ensure that cloud-native OLAP engines are reliable, fault-tolerant, efficiently scalable and cost-effective. You will also be leveraging your software engineering expertise to develop software platforms and tools to optimise the operational and engineering efficiencies of complex systems at scale, with particular focus on improving the systems' observability, performance and maintainability. In this role, you will: - Building and managing the Global SRE team, including team recruitment, new talent training, system operation/maintenance/coordination and team culture building.- Improve the cross-team/time zone/regional cooperation mechanism, and provide SRE solutions in line with actual business scenarios based on business orientation.- Responsible for SRE team arrangement and project management, guiding basic SRE work to be more effective, and improving the overall SRE efficiency.- Develop process specifications and plans for compliant access, configuration, disaster recovery and fault handling of critical paths of overseas SRE services.- Responsible for continuously improving the core SRE capabilities of OLAP engine in efficiency, cost, quality, security, etc.- Develop automation, data visualization and automated monitoring processes to facilitate the optimization of the cloud-native OLAP engine infrastructure.- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.- Manage oncall processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.- Bachelor degree or above in Computer Science or a related technical discipline and good English communication skills.- Familiar with SRE-related processes, understand the development trend of SRE technology in the industry, and have a good ability to build an SRE system, 6 years+ SRE experience, big-data or OLAP engine SRE experience is best to have .- Familiar with SRE technologies, including Kubernetes, Terraform, Ansible, Bash Scripting etc.- Familiar with cloud computing technologies of Amazon Web Services, Google Cloud Platform and other suppliers.- Expertise in operations, deployment, and trouble shooting high availability and quality assurance of large-scale distributed systems, with a strong focus on stability and performance.- Possesses a strong sense of responsibility, a proactive team spirit, and a strong ability to comprehensively analyze and solve problems.
We have other current jobs related to this field that you can find below
-
Tech Lead Manager, SRE
2 weeks ago
Singapur, Singapore TikTok Full timeTeam Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...
-
Machine Learning Ops
2 months ago
Singapur, Singapore TikTok Full timeTeam Introduction MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities - Responsible for...
-
Machine Learning Ops
4 weeks ago
Singapur, Singapore TikTok Full timeTeam Introduction MLOps - Global SRE team is responsible for the stability of machine learning systems under the Global Monetization Products and Technology organization, to ensure the stable and efficient operations of machine learning models from data preparation, development, training, deployment, serving and so on. Responsibilities - Responsible for...
-
Tech Lead Manager, SRE
4 weeks ago
Singapur, Singapore TikTok Full timeTeam Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...
-
Singapur, Singapore DBS Bank Full timeBusiness Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...
-
Singapur, Singapore DBS Bank Full timeBusiness Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...
-
Singapur, Singapore DBS Bank Full timeBusiness Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...
-
Head of Business Architecture and SRE, Technology
2 months ago
Singapur, Singapore DBS Bank Full timeHead of Business Architecture and SRE, Technology & Operations-(23000098) MD, Head of Business Technology, Enterprise Architecture & Site Reliability Engineering, Technology & Operations Department description The Enterprise Architecture and Site Reliability Engineering (EASRE) division is a team of architect, engineer, SRE practitioners, program...
-
Singapur, Singapore DBS Bank Full timeHead of Business Architecture and SRE, Technology & Operations-(23000098) MD, Head of Business Technology, Enterprise Architecture & Site Reliability Engineering, Technology & Operations Department description The Enterprise Architecture and Site Reliability Engineering (EASRE) division is a team of architect, engineer, SRE practitioners, program...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore NTT Full timeJOB DESCRIPTION NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future. In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help...
-
Site Reliability Engineer
1 month ago
Singapur, Singapore NTT DATA Full timeJob Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...
-
Site Reliability Engineer
4 weeks ago
Singapur, Singapore NTT Full timeJOB DESCRIPTION NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future. In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help...
-
Site Reliability Engineer
4 weeks ago
Singapur, Singapore NTT DATA Full timeJob Description NTT is a leading global IT solutions and services organisation that brings together people, data and things to create a better and more sustainable future.In today’s ‘iNTTerconnected’ world, connections matter more now than ever. By bringing together talented people, world-class technology partners and emerging innovators, we help our...
-
Tech Lead
4 weeks ago
Singapur, Singapore TikTok Full timeTeam Introduction The Site Reliability Engineering (SRE) team is a fusion of software and systems engineering techniques used to design and operate large-scale, extensively distributed, and resilient systems. Within Infrastructure SRE at TikTok, our primary focus is to ensure that the reliability and uptime of our infrastructure services meet the needs of...
-
SVP/VP, Specialist, SRE, EASRE, Technology
2 months ago
Singapur, Singapore DBS Bank Full timeBusiness Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...
-
SVP/VP, Specialist, SRE, EASRE, Technology
4 weeks ago
Singapur, Singapore DBS Bank Full timeBusiness Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our...
-
Machine Learning Ops, Global SRE
3 weeks ago
Singapur, Singapore TIKTOK PTE. LTD. Full timeTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly...
-
Machine Learning Ops
3 weeks ago
Singapur, Singapore TIKTOK PTE. LTD. Full timeTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly...
-
Site Reliability Engineer
4 weeks ago
Singapur, Singapore CITADEL ENTERPRISE (SINGAPORE) PTE. LIMITED Full timeRoles & ResponsibilitiesCitadel’s Site Reliability Engineers (SRE) work to bring their practices to the financial trading field by bringing innovation and cutting-edge technology to reduce complexity and improve performance. SREs are responsible for taking applications to production, providing early support for applications in development, and ensuring...
-
Resident Engineer
2 weeks ago
Singapur, Singapore Tritech Group Limited Full timeThe SRE/RE shall be an engineer in the civil/structural discipline with educational qualifications recognized by the Professional Engineers Board of Singapore. With minimum ten (10/5) years of relevant experience on projects similar in nature to works in MRT works including at least five (5/3) years in the design ofgeotechnical works for deep...