Site Reliability Engineer
3 weeks ago
TikTok will be prioritizing applicants who have a current right to work in Singapore, and do not require TikTok's sponsorship of a visa.
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.
Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.
About the Team
The Applied Machine Learning (AML) - Enterprise team provides machine learning platform products on VolcanoEngine with cloud native resource scheduling system which intelligently orchestrates different tasks and jobs with minimised costs of every experiment and maximised resource utilisation, rich modelling tools including customised machine learning tasks and web IDE, and multi-framework high performance model inference services.
In 2021, through VolcanoEngine, we released this machine learning infrastructure to the public, to provide more enterprises with reduced costs of computation power, lower barriers to machine learning engineering and deeper developments in AI capabilities.
Responsibilities
Responsible for Ark Large Model Platform development on Volcano Engine, researching systematic solutions on large model solution implementations and applications in various industries, striving to reduce the IT cost of large model applications, meeting the users' ever-growing demand for intelligent interaction and improving the lifestyle and communications of users in the future world.
- Manage and oversee the stability of both control and data aspects of large-scale model systems through effective DevOps practices.
- Develop and enhance observability systems for monitoring the stability of large model systems, ensuring high reliability and performance.
- Handle super large-scale cluster management and ensure efficient operation and maintenance of large model systems.
Qualifications
- B. Sc or higher degree in Computer Science or related fields from accredited and reputable institutions.
- Minimum of 5 years of R&D experience in the fields of cloud computing or large-scale model systems.
- Proficiency in cloud-native technologies and understanding of the relevant technology stack.
- Expertise in one of the following programming languages: Golang, Python, or Java, with the ability to use it proficiently in a professional setting.
- Familiarity with cloud-native technologies for log collection, monitoring, and alerting.
Preferred Qualifications:
- Prior experience in the construction and maintenance of stability systems for large-scale infrastructures.
- Experience in operating and maintaining large-scale systems.
- Experience with infrastructure as code, particularly Terraform, is highly desirable.
TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
Tell employers what skills you have
Machine Learning
Sponsorship
Lifestyle
Construction
Scalability
Kubernetes
Cloud Computing
Scripting
Reliability
Networking
Python
Docker
Ansible
Java
Scheduling
Linux
-
Site Reliability Engineer
3 weeks ago
Singapore Adyen Singapore Pte. Ltd. Full timeThis is AdyenAdyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition.For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and...
-
Site Reliability Engineer
2 weeks ago
Singapore Wipro Limited Full timeJob Role : Site Reliability Engineer Location : SingaporeExperience : 2+ Years of relevant experience Job Description : Responsibilities : Hands-on design, implement, and extend automation tools for infrastructure, application, and container management. Monitor Staging, Test and Development environments for a myriad of Products in an agile and dynamic...
-
Site Reliability Engineering
2 weeks ago
Singapore GEOTAB (ASIA) PTE. LTD. Full timeRoles & ResponsibilitiesWhat you'll doAs a part of Site Reliability Engineering, your key area of responsibility will be to provide stellar escalated engineering support and customer service to our core MyGeotab applications. You will support both the MyGeotab software application and the Geotab GO devices, IOX, and other hardware. Geotab receives detailed...
-
Site Reliability Engineer
7 days ago
Singapore LIVERAMP PTE. LTD. Full timeRoles & ResponsibilitiesABOUT THIS JOBThe SRE team is responsible for owning and supporting deployments of global products, and providing first line operational support. We are looking for a Site Reliability engineer who is excited about establishing and advocating for best practices for product deployments and SRE. You will be able to leverage your software...
-
Site Reliability Engineer
1 week ago
Singapore Liveramp Pte. Ltd. Full timeABOUT THIS JOBThe SRE team is responsible for owning and supporting deployments of global products, and providing first line operational support. We are looking for a Site Reliability engineer who is excited about establishing and advocating for best practices for product deployments and SRE. You will be able to leverage your software engineering expertise...
-
Site Reliability Engineering
2 weeks ago
Singapore Geotab (asia) Pte. Ltd. Full timeWhat you'll doAs a part of Site Reliability Engineering, your key area of responsibility will be to provide stellar escalated engineering support and customer service to our core MyGeotab applications. You will support both the MyGeotab software application and the Geotab GO devices, IOX, and other hardware. Geotab receives detailed data and metrics for...
-
ASE - Site Reliability Engineer
5 days ago
Singapore APPLE SOUTH ASIA PTE. LTD. Full timeRoles & ResponsibilitiesJob SummaryApple Services Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud services for millions of Apple users. We are building and...
-
ASE - Site Reliability Engineering Manager
5 days ago
Singapore APPLE SOUTH ASIA PTE. LTD. Full timeRoles & ResponsibilitiesJob SummaryApple Services Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineering Manager, to help support and scale cloud services for millions of Apple users. This is a...
-
ASE - Site Reliability Engineering Manager
3 days ago
Singapore Apple South Asia Pte. Ltd. Full timeJob SummaryApple Services Engineering team is one of the most exciting examples of Apple's long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineering Manager, to help support and scale cloud services for millions of Apple users. This is a hands-on role, to establish...
-
ASE - Site Reliability Engineer
3 days ago
Singapore Apple South Asia Pte. Ltd. Full timeJob SummaryApple Services Engineering team is one of the most exciting examples of Apple's long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud services for millions of Apple users. We are building and supporting new and existing...
-
Site Reliability Engineer
2 weeks ago
Singapore VIRTUOS HOLDINGS PTE. LTD. Full timeRoles & ResponsibilitiesJob Description PLAY, GROW and WIN To be a part of Virtuos means to be a creator. At Virtuos, we harness the latest technologies to make games better and more immersive than ever before. That is why we pride ourselves in constantly pushing the boundaries of possibility since our founding in 2004. Virtuosi is a team of...
-
Site Reliability Engineer
2 weeks ago
Singapore Virtuos Holdings Pte. Ltd. Full timeJob Description PLAY, GROW and WIN To be a part of Virtuos means to be a creator. At Virtuos, we harness the latest technologies to make games better and more immersive than ever before. That is why we pride ourselves in constantly pushing the boundaries of possibility since our founding in 2004. Virtuosi is a team of experts - people who have come together...
-
Site Reliability Engineer
2 weeks ago
Singapore Sciente Consulting Full timeMandatory Skill-set Bachelor's degree in Computer Science, Mathematics, Engineering, or any related field; Has 3 to 4 years of proven experience in monitoring application and systems; Expertise in Grafana, Elastic Stack (Elasticsearch, Logstash, Kibana, Beats), and Kafka, including setup, configuration, upgrades, patching, data management, monitoring,...
-
Site Reliability Expert Engineer
2 weeks ago
Singapore Shopee Full timeJob Description:Set up, deploy and configure marketplace services in the private cloud platform.Continuously improve the marketplace services in the private cloud, including but not limited to stress test automation, capacity management, service autoscaler, disaster recovery, chat operations, knowledge base management, SOP automation, dynamic service...
-
Site Reliability Engineer #IAC
6 days ago
Singapore RECRUIT EXPRESS PTE LTD Full timeRoles & ResponsibilitiesMy client is looking for a looking for an experienced individual to join the SRE team. The individual will support production monitoring and is expected to be hands-on using technology.Job Requirements: Java Programming Experience (2+ years) or equivalent level of coding knowledge Python/Shell Scripting (2+ years) or data...
-
Site Reliability Engineer
2 weeks ago
Singapore DECODE TECH PTE. LTD. Full timeRoles & ResponsibilitiesResponsibilities: Build and implement CI/CD solutions in AWS environment. Automate the code delivery pipeline with the goal of one click deployments, rollbacks, and parameterized builds. Build, operate and maintain application infrastructure, infrastructure automation, and monitoring of infrastructure and applications. Work...
-
Site Reliability Engineer
2 weeks ago
Singapore ORION ACADEMY PTE. LTD. Full timeRoles & ResponsibilitiesKey Responsibilities Build and implement CI/CD solutions in AWS environment. Automate the code delivery pipeline with the goal of one click deployments, rollbacks, and parameterized builds. Build, operate and maintain application infrastructure, infrastructure automation, and monitoring of infrastructure and applications. Work...
-
Engineer Reliability
2 weeks ago
Singapore GlobalFoundries Full timeAbout GlobalFoundriesGlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...
-
Lead Site Reliability Engineer
1 week ago
Singapore GXS Bank Full timeGet to know our Team: We are living in dynamic times. Technology is reshaping how we live, and we want to use it to redefine how financial services are offered. Grab is the leading technology company in Southeast Asia offering everyday services to the masses. Singtel is Asia's leading communications group connecting millions of consumers and enterprises...
-
Senior Site Reliability Engineer
3 weeks ago
Singapore Shopee Full timeJob Description:Fun and energetic team culture with strong emphasis on learning, sharing and growth.Learning programme / roadmap for all new hires (applicable for both fresh / experienced).Wide exposure to enable rapid growth in personal skills and career.Deep dive into Marketplace core product lines.50:50 time spent between technical operations and software...