Site Reliability Engineer
2 weeks ago
[About ByteDance]
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
[About the Team]
The Datacenter Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable.
[Responsibilities]
As the [Site Reliability Engineer - Infrastructure Engineering], you would be responsible for at least one if not all of these areas:
**Infrastructure**:
- Build, expand and operate global infrastructures, including large-scale systems in public and private clouds, data centers and content delivery networks.
- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
- Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement.
- Supporting end-to-end to production environment by responding to performance and reliability issues and participating in rotational on-calls.
**Security**:
- Conduct security reviews of core corporate and production infrastructure.
- Carry out security updates and protect enterprise infrastructure in system and network level.
- Drive enterprise focused security improvements to products and services.
- Build security tools and processes for critical infrastructure protection, monitoring and remediation.
**Traffic**:
- Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the traffic infrastructure.
- Provide primary operational support and engineering for traffic infrastructure systems.
- Gather and analyze metrics to assist in performance tuning and fault finding.
[Minimum Qualifications]
- Bachelor’s degree in Computer Science or equivalent with 3+ years of relevant experience.
- Experience in one or more programming languages such as Java, Python C++, Go, or scripting experience in Shell and Python.
- Ability to thrive in a fast-paced environment.
- Relevant experience working in a Datacenter setup or environment with large scale infrastructure setup featuring high traffic.
As a Site Reliability Engineer with the Infrastructure Engineering team, you would be expected to be an expert in at least one if not all of these areas as well:
**Infrastructure**:
- Experience working with Cloud infrastructure
- Experience in building solutions with AWS, Google, Azure and other cloud services.
- Experience in developing and operating one or more following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.
- Experience working with Unix Linux systems, from kernel to shell and beyond.
- Experience working with system libraries, file systems, and client-server protocols.
- Experience in designing, analyzing, and building automation and tools for large scale systems.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier grade environment.
**Security**:
- Experience in networking security like DDoS and WAF protection.
- Experience in security protocols like TLS protocol features and updates.
- Experience in VPNs and building encrypted communication channel.
- Conducted infrastructure security review, patch and update potential security vulnerabilities.
- Experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
**Traffic**:
- Experience working with traffic systems from CDNs to loadbalancers and beyond.
- Experience working with network devices, remote management systems, and client-server protocols.
- Knowledge of network infrastructure and/or routing.
- Experience with Layer 4 / Layer 7 loadbalancers.
- Knowledge of protocols like TCP/IP, HTTP, RPC, TLS etc.
- Experience working with containerized environment.
- Experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
-
Site Reliability Engineer
4 days ago
North-East Singapore PERSOLKELLY Full timeThe Site Reliability Engineer is responsible for ensuring the reliability, scalability, and efficiency of our systems and infrastructure. This role involves monitoring, troubleshooting, and resolving issues to maintain optimal performance. The engineer will also collaborate with cross-functional teams to automate processes and improve system reliability....
-
Site Reliability Engineer
1 week ago
Singapore Rapsys Technologies Full timeDrive the Site Reliability Engineering agenda forward at an Enterprise Level to improve availability, reliability, and performance of services. - Drive cross-team efforts in resiliency assessment exercises and reporting - Draft and/or contribute to internal SRE training materials - Support services before they go live through activities such as Chaos testing...
-
Site Reliability Engineer
3 weeks ago
Singapore THALES SOLUTIONS ASIA PTE. LTD. Full timeRoles & ResponsibilitiesDigital Competence Center (DCC)Thales IFE has decided to create a leading technology center in Singapore for its IFE Digital Engineering. It will leverage on unique digital skillset from Singapore and neighbouring countries on Cloud engineering. Thanks to a multi-year strategic plan, Thales is locating at WeWork@Suntec, a center that...
-
Site Reliability Engineer
2 days ago
Singapore NLS Full timeMy client, a global hedge fund, is actively seeking a hands on a highly skilled and motivated SRE to join their team. As an SRE, you will play a critical role in driving the adoption of Site Reliability Engineering practices within their organization. The ideal candidate will have a strong technical background and a passion for driving operational efficiency...
-
Site Reliability Engineer
6 days ago
Singapore Imperva Full time**Site Reliability Engineer**:** About the role** Imperva’s Infrastructure and Cloud team is looking for a highly technical Site Reliability Engineer to drive innovation, scale, and create operational excellence for the Imperva globally distributed network. As an SRE in the ICO organization, you approach solving, supporting, and optimizing the...
-
Site Reliability Engineer
2 days ago
Singapore Retentia technology private limited Full time**3+ years of experience in Site Reliability Engineering, DevOps**, or a related field. - **Strong knowledge of cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).** - Experience with automation and configuration management tools (e.g., T**erraform, Ansible, Chef, or Puppet).** - Proficiency in at least **one programming...
-
Site Reliability Engineer
1 week ago
Singapore M2R System Technology Pte. Ltd. Full time**Responsibilities**: - Run production environment by monitoring availability and taking a holistic view of the system health - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost - Manage risks and resolves issues that affect the release scope, schedule and quality - Suggest architecture improvements, push for...
-
Site Reliability Engineer
7 hours ago
Singapore The Edge Asia Full timeOur client is a US hedge fund and their Technology group is constantly improving the company’s IT infrastructure, positioning them at the forefront of a rapidly evolving technology landscape. They are a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. Their...
-
Site Reliability Engineer
1 day ago
Singapore IFUN GAMES Full time**Responsibilities** - Design, implement, and maintain tools and processes for monitoring, alerting, and incident response - Collaborate with developers to improve the design and operation of systems, with a focus on reliability, performance, and scalability - Participate in on-call rotations to respond to incidents and handle escalations - Analyze system...
-
Site Engineering and Reliability Expert
5 days ago
Singapore beBee Careers Full timeWe're seeking a Site Engineering and Reliability Expert to join our SRE team and contribute to the success of our distributed cloud services.The ideal candidate will have a strong background in system reliability engineering, with hands-on experience in automating and optimizing system processes.Key Responsibilities:As a Site Engineering and Reliability...