Site Reliability Engineer

7 hours ago


Singapore ByteDance Full time

[About ByteDance]
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

[About the Team]
The Datacenter Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable.

[Responsibilities]
As the [Site Reliability Engineer - Infrastructure Engineering], you would be responsible for at least one if not all of these areas:
**Infrastructure**:

- Build, expand and operate global infrastructures, including large-scale systems in public and private clouds, data centers and content delivery networks.
- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
- Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement.
- Supporting end-to-end to production environment by responding to performance and reliability issues and participating in rotational on-calls.

**Security**:

- Conduct security reviews of core corporate and production infrastructure.
- Carry out security updates and protect enterprise infrastructure in system and network level.
- Drive enterprise focused security improvements to products and services.
- Build security tools and processes for critical infrastructure protection, monitoring and remediation.

**Traffic**:

- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the traffic infrastructure.
- Provide primary operational support and engineering for traffic infrastructure systems.
- Gather and analyze metrics to assist in performance tuning and fault finding.

[Minimum Qualifications]
- Bachelor’s degree in Computer Science or equivalent with 3+ years of relevant experience.
- Experience in one or more programming languages such as Java, Python C++, Go, or scripting experience in Shell and Python.
- Ability to thrive in a fast-paced environment.
- Relevant experience working in a Datacenter setup or environment with large scale infrastructure setup featuring high traffic.

As a Site Reliability Engineer with the Infrastructure Engineering team, you would be expected to be an expert in at least one if not all of these areas as well:
**Infrastructure**:

- Experience working with Cloud infrastructure
- Experience in building solutions with AWS, Google, Azure and other cloud services.
- Experience in developing and operating one or more following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
- Experience working with Unix Linux systems, from kernel to shell and beyond.
- Experience working with system libraries, file systems, and client-server protocols.
- Experience in designing, analyzing, and building automation and tools for large scale systems.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier grade environment.

**Security**:

- Experience in networking security like DDoS and WAF protection.
- Experience in security protocols like TLS protocol features and updates.
- Experience in VPNs and building encrypted communication channel.
- Conducted infrastructure security review, patch and update potential security vulnerabilities.
- Experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.

**Traffic**:

- Experience working with traffic systems from CDNs to loadbalancers and beyond.
- Experience working with network devices, remote management systems, and client-server protocols.
- Knowledge of network infrastructure and/or routing.
- Experience with Layer 4 / Layer 7 loadbalancers.
- Knowledge of protocols like TCP/IP, HTTP, RPC, TLS etc.
- Experience working with containerized environment.
- Experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.



  • Singapore Sea Limited Full time

    Engineering and Technology - Infrastructure, Singapore - Entry Level Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Site Reliability Engineer, you are responsible for improving the availability and reliability of our Infrastructure services. - Responsible for...


  • Singapore Hyphen Connect Full time

    Site Reliability Engineer (Crypto Trading) Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect Site Reliability Engineer (Crypto Trading) 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect We are hiring for one of our ecosystem projects in...


  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore TEAMLEASE DIGITAL CONSULTING PTE. LTD. Full time

    As a Site Reliability Engineer, you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault-tolerant and designed to scale. You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime. Key...


  • Singapore HCLTech Full time

    Get AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...


  • Singapore Vega Solutions Full time

    Join to apply for the Site Reliability Engineer role at Vega SolutionsJoin to apply for the Site Reliability Engineer role at Vega SolutionsGet AI-powered advice on this job and more exclusive features.Tokka Labs | Singapore | Full-TimeTokka Labs is a proprietary trading firm with a focus on close collaboration, rigorous research, and cutting-edge...


  • Singapore Tardis Group Full time

    Direct message the job poster from Tardis Group Recruiter at Tardis Group | Finding Top Talent in Tech & Quant About the Company A rapidly growing technology firm operating at the forefront of artificial intelligence and advanced software solutions. The company fosters a fast-paced, collaborative, and innovation-driven culture, uniting talent across...


  • Singapore HCLTech Full time

    Get AI-powered advice on this job and more exclusive features.This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey.As a Site Reliability Engineer you will be filling a mission-critical...


  • Singapore JJ Consulting Services Full time

    Our Client is a fast growing company in Singapore, who is seeking to recruit a Site Reliability Engineer. **Site Reliability Engineer** **Key Roles & Responsibilities** - Providing ancillary support of Enterprise-Grade Products and solutions at customer's sites - Ironing out deployment issues or challenges that our customers may face - Responsible for...


  • Singapore Qlik Full time

    **What makes us Qlik?** A Gartner® Magic Quadrant Leader for 14 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel in...