
Site Reliability Engineer
8 hours ago
Direct message the job poster from Tardis Group
Recruiter at Tardis Group | Finding Top Talent in Tech & Quant
About the Company
A rapidly growing technology firm operating at the forefront of artificial intelligence and advanced software solutions. The company fosters a fast-paced, collaborative, and innovation-driven culture, uniting talent across engineering, research, and product teams to create impactful solutions. This role offers the opportunity to work on exciting projects, leverage cutting-edge technologies, and make a real difference in the AI and mobile development space.
Key Responsibilities
Cluster Operations & Management
- Manage and maintain container clusters (e.g., Kubernetes, Docker) and open-source component clusters (e.g., Kafka, Redis, Elasticsearch) across multiple environments and business units.
- Monitor and optimize distributed systems to ensure high performance, scalability, and reliability.
Infrastructure Platform Development
- Design, build, and improve infrastructure operations platforms.
- Develop and maintain solutions for infrastructure management, CI/CD pipelines, monitoring and alerting systems, and centralized logging.
- Lead platform standardization efforts and drive automation to streamline operations.
High Availability & Reliability
- Ensure maximum uptime for production services through proactive monitoring, rapid incident response, and root cause analysis.
- Continuously refine service architecture, deployment strategies, and operational processes for improved resilience.
- Implement and maintain SLA/SLO frameworks, applying reliability engineering best practices.
Automation & Process Improvement
- Develop automated systems for operations and maintenance to minimize manual intervention.
- Create self-service tools and workflows to boost team productivity.
- Define and enforce best practices for infrastructure-as-code, configuration management, and change control.
Required Qualifications
Experience & Education
- Minimum 2 years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE).
- Bachelor's degree in Computer Science, Engineering, or a related technical discipline preferred.
- Familiarity with public cloud platforms (AWS, Azure, or GCP) is highly valued.
- Strong understanding of large-scale internet architectures and distributed systems.
- Proven experience with infrastructure monitoring, logging, and observability tools.
Technical Skills
- Proficiency in scripting and automation (e.g., Shell, Python).
- Strong knowledge of containerization technologies (Kubernetes, Docker).
- Hands-on experience managing production-grade container clusters and maintaining CI/CD pipelines.
- Familiarity with infrastructure components such as Nginx, MySQL, Redis, Kafka, and Elasticsearch.
Advanced Networking (Preferred)
- Experience with Service Mesh architectures, Cilium CNI, and eBPF technologies.
- Understanding of network security, load balancing, and traffic management.
- Knowledge of cloud-native networking patterns and best practices.
If you're ready to make an impact in a role that combines software development with cutting-edge AI, we encourage you to apply. Please note that only shortlisted candidates will be contacted.
CEI: 23S1921
Seniority level
Seniority level
Associate
Employment type
Employment type
Full-time
Job function
Job function
Information TechnologyIndustries
Software Development
Referrals increase your chances of interviewing at Tardis Group by 2x
Get notified about new Site Reliability Engineer jobs in Singapore, Singapore .
Production Engineer / Site Reliability Engineer
Site Reliability Engineer (EMEA, Japan, Singapore, Australia)
Information Technology - Cloud/DevOps Engineer
Engineer (Energy Management Systems Department)
Site Reliability Engineer Intern Start
Site Reliability Engineer (SRE) (GovTech)
Downtown Core, Central Singapore Community Development Council, Singapore 3 weeks ago
We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr-
Site Reliability Engineer
2 weeks ago
Singapore Hyphen Connect Full timeSite Reliability Engineer (Crypto Trading) Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect Site Reliability Engineer (Crypto Trading) 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect We are hiring for one of our ecosystem projects in...
-
Site Reliability Engineer
1 week ago
Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time**Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...
-
Site Reliability Engineer
3 days ago
Singapore Hyphen Connect Full timeSite Reliability Engineer (Crypto Trading) Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect Site Reliability Engineer (Crypto Trading) 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect We are hiring for one of our ecosystem...
-
Site Reliability Engineer
2 weeks ago
Singapore TEAMLEASE DIGITAL CONSULTING PTE. LTD. Full timeAs a Site Reliability Engineer, you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault-tolerant and designed to scale. You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime. Key...
-
Site Reliability Engineer
2 weeks ago
Singapore HCLTech Full timeGet AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...
-
Site Reliability Engineer
3 weeks ago
Singapore Vega Solutions Full timeJoin to apply for the Site Reliability Engineer role at Vega SolutionsJoin to apply for the Site Reliability Engineer role at Vega SolutionsGet AI-powered advice on this job and more exclusive features.Tokka Labs | Singapore | Full-TimeTokka Labs is a proprietary trading firm with a focus on close collaboration, rigorous research, and cutting-edge...
-
Site Reliability Engineer
3 weeks ago
Singapore HCLTech Full timeGet AI-powered advice on this job and more exclusive features.This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey.As a Site Reliability Engineer you will be filling a mission-critical...
-
Site Reliability Engineer
3 days ago
Singapore ByteDance Full timeSite Reliability Engineer - Privacy & Security - Singapore Site Reliability Engineer - Privacy & Security - Singapore 4 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Responsibilities Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen...
-
Site Reliability Engineer
8 hours ago
Singapore ByteDance Full timeSite Reliability Engineer - Privacy & Security - Singapore Site Reliability Engineer - Privacy & Security - Singapore 4 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. ResponsibilitiesFounded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen...
-
Site Reliability Engineer
1 week ago
Singapore JJ Consulting Services Full timeOur Client is a fast growing company in Singapore, who is seeking to recruit a Site Reliability Engineer. **Site Reliability Engineer** **Key Roles & Responsibilities** - Providing ancillary support of Enterprise-Grade Products and solutions at customer's sites - Ironing out deployment issues or challenges that our customers may face - Responsible for...