Site Reliability Engineer
2 weeks ago
Responsibilities About the team: TikTok Shop is a content e-commerce business utilising international short video products as carriers. Our aim is to become the preferred choice for users seeking to discover and purchase affordable, high-quality products. We provide users with tailored, vibrant, and efficient consumption experiences while enabling merchants to access robust and dependable platform services in various scenarios, such as live e-commerce and short video content e-commerce. Our vision is to make affordable and high-quality products easily accessible, enhancing the quality of life for all. We are looking for passionate and talented people to join our product and operations team, to build an e-commerce ecosystem that is innovative, secure and intuitive for our users and brands. Our role combines software and systems engineering disciplines to run high-performance, large-scale distributed infrastructure. This means you will be deeply involved in the developmental lifecycle of critical software services, collaborating closely with product engineers to combine software code and systems knowledge to ensure that TikTok Shop's services are reliable, fault-tolerant, efficiently scalable and cost-effective. You will also be leveraging your software engineering expertise to develop software platforms and tools to optimise the operational and engineering efficiencies of complex systems at scale, with particular focus on improving the systems' observability, performance and maintainability. Focused on TikTok Shop business, provide SRE solutions that cater to actual business scenarios based on cross-team, cross-timezone, and cross-region collaboration mechanisms. Participate in building disaster recovery capabilities for TikTok Shop, offering end-to-end disaster recovery solutions to ensure the ability to switch over during extreme failure scenarios. Continuously enhance the core capabilities of TikTok Shop SRE in terms of stability, efficiency, cost, and security, and participate in the operation of key metrics (including incident recall rate, SLI, MTTD, MTTR, resource utilization, etc.). Promote the design and implementation of operation and maintenance tools and platform solutions to improve the infrastructure capabilities of the TikTok Shop platform. Participate in on-call duty, respond to performance and availability issues, resolve problems, and minimize downtime as much as possible. Qualifications Minimum Qualifications: Bachelor's or higher degree in Computer Science, Information Technology, Programming & System Analysis, Science (Computer Studies) or related discipline. Candidate should have at least 5 years of experience in one or more programming languages (such as Java, C++, Go), or scripting experience with Shell/Python. Familiarity with e-commerce business, common network and access layer faults, and relevant construction experience. Professional knowledge in operation, deployment, high availability, and quality assurance of large-scale distributed systems, with a strong sense of responsibility and strong problem analysis and solving skills. Job Information About TikTok TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo. Why Join Us Inspiring creativity is at the core of TikTok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and bring joy - a mission we work towards every day. We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We're resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. When we create and grow together, the possibilities are limitless. Join us. Diversity & Inclusion TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too. #J-18808-Ljbffr
-
Site Reliability Engineer
1 week ago
Singapore DHATCH CONSULTANCY PTE. LTD. Full timeSite Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...
-
Site Reliability Engineer
2 weeks ago
Singapore Crystal Equation Corporation Full timeWe are seeking a skilled Site Reliability Engineer (SRE) to join our team. SRE will be responsible for keeping all internal user-facing applications and other production systems running smoothly. This hybrid role involves a combination of both development and operations skills to build and manage systems that are both efficient and reliable. The Enterprise...
-
Site Reliability Engineer
1 week ago
Singapore The Edge Asia Full timeOur client is a US hedge fund and their Technology group is constantly improving the company’s IT infrastructure, positioning them at the forefront of a rapidly evolving technology landscape. They are a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. Their...
-
Site Reliability Engineer
2 weeks ago
Singapore Retentia technology private limited Full time**3+ years of experience in Site Reliability Engineering, DevOps**, or a related field. - **Strong knowledge of cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).** - Experience with automation and configuration management tools (e.g., T**erraform, Ansible, Chef, or Puppet).** - Proficiency in at least **one programming...
-
Site Reliability Engineer
2 weeks ago
Singapore DT One Full timeAbout DT One DT One was founded to provide mobile carriers with the infrastructure and services they need to help migrant workers stay in touch with their family and friends back home. Today we operate a leading global network for mobile top‐up solutions, innovative mobile rewards, and Phone‐to‐Phone solutions. Our global network delivers better...
-
Senior Site Reliability Engineer
1 day ago
Singapore AKAMAI TECHNOLOGIES APJ PTE. LTD. Full timeAs a Senior Site Reliability Engineer, you will influence a wide array of teams. You will be responsible for the performance and reliability of Akamai’s delivery products by working with the Product, Engineering and Support teams to diagnose, mitigate and solve outages. You will have to solve some of the most complex problems in distributed systems at...
-
Site Reliability Engineer
1 week ago
Central Singapore Emprego SG Full time**Location** Singapore, Central Singapore **Job Type** Permanent **Salary** 9,000 - 15,000 Per **Date Posted** 5 hours ago Additional Details **Job ID** 16908 **Job Views** 1 Roles & Responsibilities **Objectives of this Role** - Run the production environment by monitoring availability and taking a holistic view of system health Improve...
-
Site Reliability Engineer
1 day ago
Singapore SINGAPORE POWER LIMITED Full time**What You'll Do**: - Evangelist for Site Reliability Engineer (SRE) practices in SP Digital (SPD) - Maintain the Reliability tools with regular patching and upgrades - Mange and evolve the full stack observability tools used in SPD - Enhance the customer experience by simplifying the onboarding process and documentation - Work with teams to improve the...
-
Site Reliability Engineer
5 days ago
Singapore GXS BANK PTE. LTD. Full time**Job Description & Requirements**: Get to know the Role: - As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. - Much of our support and software development focuses on optimizing existing systems, building...
-
Site Reliability Engineer
7 days ago
Singapore Rapsys Technologies Full time**Roles and Responsibilities**: 2. Set up and operate the server infrastructure and software (Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, Nginx) based on bank’s security standards and industry’s security standards. 3. Perform continuous improvement for the platform covering areas such as: capacity planning, observability, monitoring,...