Component Site Reliability Engineer
5 days ago
Responsibilities
About TikTok
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.
Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.
About the Team
The e-commerce industry has seen tremendous growth in recent years and has become a hotly contested space amongst leading Internet companies, and its future growth cannot be underestimated. With millions of loyal users globally, we believe TikTok is an ideal platform to deliver a brand new and better e-commerce experience to our users. Our product engineering team is responsible for building an e-commerce ecosystem that is innovative, secure and intuitive for our users. We are looking for passionate and talented people to join us as we drive the future of e-commerce here at TikTok.
Our software engineers for product infrastructure role combine software and systems engineering disciplines to run high-performance, large-scale distributed infrastructure. This means you will be deeply involved in the developmental lifecycle of critical software services, collaborating closely with product engineers to combine software code and systems knowledge to ensure that TikTok e-commerce's services are reliable, fault-tolerant, efficiently scalable and cost-effective. You will also be leveraging your software engineering expertise to develop software platforms and tools to optimise the operational and engineering efficiencies of complex systems at scale, with particular focus on improving the systems' observability, performance and maintainability.
**Responsibilities**:
- Collaborate with cross-functional teams across time zones and regions. Leverage your expertise to provide feasible solutions using internal cloud-native products and environments for various business scenarios.
- Develop key operational and maintenance processes for critical components, including storage services, container computing services, microservices architecture, and network infrastructure.
- Continuously improve the capabilities of core services by optimizing efficiency, cost, quality, and security.
- Champion infrastructure-as-code principles, focusing on scalability and service resiliency through automation.
- Design and implement automation, data visualization, and automated monitoring processes to optimize core components.
- Actively participate in incident management and post-mortem analysis while adhering to best practices and contributing to on-call rotations.
**Qualifications**:
- Bachelor's or higher degree in Computer Science, Information Technology, Programming & System Analysis, Science (Computer Studies) or related discipline.
- 2 years+ of related experience.
- Proficiency in Linux operating system internals, networking and microservices in cloud-native environments.
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
- Experience with Kubernetes, Docker, Mesh, MySQL, Redis, MQ, RocksDB, Mongo, etc.
Preferred Qualifications
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- Experience developing languages such as java/C++/Go, or platform/tools using scripting languages such as Python/Bash.
- Experience with running production-grade web services at scale in a cloud native environment.
- Experience with implementing observability solutions such as monitoring, logging and tracing in complex service meshes.
TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
-
Aircraft Component Reliability Engineer
2 weeks ago
Singapore Singapore Technologies Engineering Ltd Full timeJob ID: 16248- Location: Aero - 600 West Camp Road, SG- Description: - **Aircraft Component Reliability Engineer** - Analyse on-wing reliability of aircraft components, detect abnormal component removals. Provide removal forecast and expected repair cost. - Communicate with aircraft and component manufacturers and repair shops to investigate abnormal...
-
Site Reliability Engineer
2 weeks ago
Singapore Bohan Group Full timeAs a Site Reliability Engineer (SRE), you will drive operational excellence by combining deep technical knowledge with a strong focus on automation and tooling. **Your responsibilities will include**: - Designing and implementing core components of a robust SRE framework across both new and legacy systems. - Partnering with development and quantitative...
-
Site Reliability Engineer
1 week ago
Singapore Experis Full time**Site Reliability Engineer**: - Location- Singapore- Job reference- BBBH133368_1699927914- Salary- S$6000 - S$7500 per month- Consultant name - Rajasekar Shirley Monisha Consultant contact no. - 6232 5244 - EA License No. - 02C3423 - Consultant Registration No. - R22106767 **Responsibilities**: - Responsible for deployment, change, issues triage and...
-
Site Reliability Engineer
6 days ago
Singapore Second Talent Full time $80,000 - $120,000 per yearCluster Operations & ManagementManage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business unitsEnsure optimal performance, scalability, and reliability of distributed systemsInfrastructure Platform DevelopmentDesign, build, and enhance infrastructure operation...
-
Site Reliability Engineer
2 weeks ago
Singapore Second Talent Full timeInfrastructure Platform Development Design, build, and enhance infrastructure operation platforms Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging Drive platform standardization and automation initiatives High Availability & Reliability Ensure maximum uptime for production services...
-
Site Reliability Engineer
2 weeks ago
Singapore Rapsys Technologies Full time**Experience**: 4+ Years **Location**: Changi, Singapore **Roles and Responsibilities**: 2. Set up and operate the server infrastructure and software (Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, Nginx) based on bank’s security standards and industry’s security standards. 3. Perform continuous improvement for the platform covering areas...
-
DevOps Engineer
2 weeks ago
Singapore TARDIS GROUP SINGAPORE PTE. LTD. Full time**About the job**: **Key Responsibilities** Cluster Operations & Management - Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units - Ensure optimal performance, scalability, and reliability of distributed systems Infrastructure Platform Development -...
-
Site Reliability Engineer
23 hours ago
Singapore Rapsys Technologies Full time**Roles and Responsibilities**: 2. Set up and operate the server infrastructure and software (Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, Nginx) based on bank’s security standards and industry’s security standards. 3. Perform continuous improvement for the platform covering areas such as: capacity planning, observability, monitoring,...
-
Site Reliability/devops Engineer
5 days ago
Singapore Experis Full time**Site Reliability/DevOps Engineer**: - Location- Singapore- Job reference- BBBH133437_1700207249- Salary- S$5500 - S$7500 per month- Consultant name - Claudia Kueh Kee Jinq Consultant contact no. - 65515579 - EA License No. - 02C3423 - Consultant Registration No. - R1880247 **Responsibilities**: - Responsible for deployment, change, issues triage and...
-
Senior Site Reliability Engineer
3 days ago
Singapore Oxford Knight Full timeSenior Site Reliability Engineer - Singapore or Hong Kong **Salary**: up to 250-275k SGD base **Summary** High-frequency prop trading firm with offices worldwide looking for skilled Senior Site Reliability Engineer developer to support and maintain their Linux trading infrastructure on a day-to-day basis. This is a pivotal role where you will lead...