Site Reliability Engineer

3 weeks ago


Singapore TIKTOK PTE. LTD. Full time
Roles & Responsibilities

TikTok will be prioritizing applicants who have a current right to work in Singapore, and do not require TikTok's sponsorship of a visa.


TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.


Why Join Us

Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.

To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.

At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.

Join us.


About the Team

The Applied Machine Learning (AML) - Enterprise team provides machine learning platform products on VolcanoEngine with cloud native resource scheduling system which intelligently orchestrates different tasks and jobs with minimised costs of every experiment and maximised resource utilisation, rich modelling tools including customised machine learning tasks and web IDE, and multi-framework high performance model inference services.


In 2021, through VolcanoEngine, we released this machine learning infrastructure to the public, to provide more enterprises with reduced costs of computation power, lower barriers to machine learning engineering and deeper developments in AI capabilities.


Responsibilities

Responsible for Ark Large Model Platform development on Volcano Engine, researching systematic solutions on large model solution implementations and applications in various industries, striving to reduce the IT cost of large model applications, meeting the users' ever-growing demand for intelligent interaction and improving the lifestyle and communications of users in the future world.


- Manage and oversee the stability of both control and data aspects of large-scale model systems through effective DevOps practices.

- Develop and enhance observability systems for monitoring the stability of large model systems, ensuring high reliability and performance.

- Handle super large-scale cluster management and ensure efficient operation and maintenance of large model systems.


Qualifications

- B. Sc or higher degree in Computer Science or related fields from accredited and reputable institutions.

- Minimum of 5 years of R&D experience in the fields of cloud computing or large-scale model systems.

- Proficiency in cloud-native technologies and understanding of the relevant technology stack.

- Expertise in one of the following programming languages: Golang, Python, or Java, with the ability to use it proficiently in a professional setting.

- Familiarity with cloud-native technologies for log collection, monitoring, and alerting.


Preferred Qualifications:

- Prior experience in the construction and maintenance of stability systems for large-scale infrastructures.

- Experience in operating and maintaining large-scale systems.

- Experience with infrastructure as code, particularly Terraform, is highly desirable.


TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.


Tell employers what skills you have

Machine Learning
Sponsorship
Lifestyle
Construction
Scalability
Kubernetes
Cloud Computing
Scripting
Reliability
Networking
Python
Docker
Ansible
Java
Scheduling
Linux

  • Singapore Adyen Singapore Pte. Ltd. Full time

    This is AdyenAdyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition.For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and...


  • Singapore Wipro Limited Full time

    Job Role  : Site Reliability Engineer Location : SingaporeExperience : 2+ Years of relevant experience Job Description : Responsibilities : Hands-on design, implement, and extend automation tools for infrastructure, application, and container management. Monitor Staging, Test and Development environments for a myriad of Products in an agile and dynamic...


  • Singapore GEOTAB (ASIA) PTE. LTD. Full time

    Roles & ResponsibilitiesWhat you'll doAs a part of Site Reliability Engineering, your key area of responsibility will be to provide stellar escalated engineering support and customer service to our core MyGeotab applications. You will support both the MyGeotab software application and the Geotab GO devices, IOX, and other hardware. Geotab receives detailed...


  • Singapore LIVERAMP PTE. LTD. Full time

    Roles & ResponsibilitiesABOUT THIS JOBThe SRE team is responsible for owning and supporting deployments of global products, and providing first line operational support. We are looking for a Site Reliability engineer who is excited about establishing and advocating for best practices for product deployments and SRE. You will be able to leverage your software...


  • Singapore Liveramp Pte. Ltd. Full time

    ABOUT THIS JOBThe SRE team is responsible for owning and supporting deployments of global products, and providing first line operational support. We are looking for a Site Reliability engineer who is excited about establishing and advocating for best practices for product deployments and SRE. You will be able to leverage your software engineering expertise...


  • Singapore Geotab (asia) Pte. Ltd. Full time

    What you'll doAs a part of Site Reliability Engineering, your key area of responsibility will be to provide stellar escalated engineering support and customer service to our core MyGeotab applications. You will support both the MyGeotab software application and the Geotab GO devices, IOX, and other hardware. Geotab receives detailed data and metrics for...


  • Singapore APPLE SOUTH ASIA PTE. LTD. Full time

    Roles & ResponsibilitiesJob SummaryApple Services Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud services for millions of Apple users. We are building and...


  • Singapore APPLE SOUTH ASIA PTE. LTD. Full time

    Roles & ResponsibilitiesJob SummaryApple Services Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineering Manager, to help support and scale cloud services for millions of Apple users. This is a...


  • Singapore Apple South Asia Pte. Ltd. Full time

    Job SummaryApple Services Engineering team is one of the most exciting examples of Apple's long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineering Manager, to help support and scale cloud services for millions of Apple users. This is a hands-on role, to establish...


  • Singapore Apple South Asia Pte. Ltd. Full time

    Job SummaryApple Services Engineering team is one of the most exciting examples of Apple's long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineer, to help support and scale cloud services for millions of Apple users. We are building and supporting new and existing...


  • Singapore VIRTUOS HOLDINGS PTE. LTD. Full time

    Roles & ResponsibilitiesJob Description PLAY, GROW and WIN To be a part of Virtuos means to be a creator.  At Virtuos, we harness the latest technologies to make games better and more immersive than ever before. That is why we pride ourselves in constantly pushing the boundaries of possibility since our founding in 2004.  Virtuosi is a team of...


  • Singapore Virtuos Holdings Pte. Ltd. Full time

    Job Description PLAY, GROW and WIN To be a part of Virtuos means to be a creator. At Virtuos, we harness the latest technologies to make games better and more immersive than ever before. That is why we pride ourselves in constantly pushing the boundaries of possibility since our founding in 2004. Virtuosi is a team of experts - people who have come together...


  • Singapore Sciente Consulting Full time

    Mandatory Skill-set Bachelor's degree in Computer Science, Mathematics, Engineering, or any related field; Has 3 to 4 years of proven experience in monitoring application and systems; Expertise in Grafana, Elastic Stack (Elasticsearch, Logstash, Kibana, Beats), and Kafka, including setup, configuration, upgrades, patching, data management, monitoring,...


  • Singapore Shopee Full time

    Job Description:Set up, deploy and configure marketplace services in the private cloud platform.Continuously improve the marketplace services in the private cloud, including but not limited to stress test automation, capacity management, service autoscaler, disaster recovery, chat operations, knowledge base management, SOP automation, dynamic service...


  • Singapore RECRUIT EXPRESS PTE LTD Full time

    Roles & ResponsibilitiesMy client is looking for a looking for an experienced individual to join the SRE team. The individual will support production monitoring and is expected to be hands-on using technology.Job Requirements: Java Programming Experience (2+ years) or equivalent level of coding knowledge Python/Shell Scripting (2+ years) or data...


  • Singapore DECODE TECH PTE. LTD. Full time

    Roles & ResponsibilitiesResponsibilities: Build and implement CI/CD solutions in AWS environment. Automate the code delivery pipeline with the goal of one click deployments, rollbacks, and parameterized builds. Build, operate and maintain application infrastructure, infrastructure automation, and monitoring of infrastructure and applications. Work...


  • Singapore ORION ACADEMY PTE. LTD. Full time

    Roles & ResponsibilitiesKey Responsibilities Build and implement CI/CD solutions in AWS environment. Automate the code delivery pipeline with the goal of one click deployments, rollbacks, and parameterized builds. Build, operate and maintain application infrastructure, infrastructure automation, and monitoring of infrastructure and applications. Work...

  • Engineer Reliability

    2 weeks ago


    Singapore GlobalFoundries Full time

    About GlobalFoundriesGlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...


  • Singapore GXS Bank Full time

    Get to know our Team: We are living in dynamic times. Technology is reshaping how we live, and we want to use it to redefine how financial services are offered. Grab is the leading technology company in Southeast Asia offering everyday services to the masses. Singtel is Asia's leading communications group connecting millions of consumers and enterprises...


  • Singapore Shopee Full time

    Job Description:Fun and energetic team culture with strong emphasis on learning, sharing and growth.Learning programme / roadmap for all new hires (applicable for both fresh / experienced).Wide exposure to enable rapid growth in personal skills and career.Deep dive into Marketplace core product lines.50:50 time spent between technical operations and software...