Site Reliability Engineer

1 week ago

Singapore HCLTech Full time

Get AI-powered advice on this job and more exclusive features.

This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey.

As a Site Reliability Engineer you will be filling a mission-critical role ensuring that the systems are healthy, monitored, automated, fault tolerant and designed to scale.

You will collaborate and work closely with engineering teams to continually improve production services, facilitating fast delivery of new products, and reducing downtime.

Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
Drive observability for our applications.
Drive optimise-operate initiative, example, reduction of operation toil
Work with application teams in setting up SLI, SLO and Error budget for their applications
Work with enterprise team in deploying SRE enablers/initiatives.

Requirements

At least 6-8 years IT experience with at least 3 years in a project deployment capacity, preferably gained in IT banking environment or a system integrator environment
The candidate should have knowledge on leveraging on LLM's & deploying solutions for different Gen-AI use cases.
The candidate should have strong infrastructure/technical background with knowledge on Open Systems platform. Moderate information security knowledge
Have a good understanding of ITIL & SRE processes & practices
Have good leadership skills in working with application teams and service providers in defining infrastructure deployment plan, cutover/migration strategy and test plan.
Able to formulae and establish infrastructure deployment standards.
Good people management, vendor management and project management skills
Agile, AWS certification preferred
Able to create Bash/Python scripts for infra deployment
Must able to practice SRE & Chaos Engineering principles
Understands key SRE concepts such as Toil, SLI, SLO, Error Budgets, MTTD, MTTR, etc
Strong, committed, and reliable team player, able to take direction but also willing to contribute to discussions on design and strategy.
Possess strong interpersonal and communication skills to be able to deal with and form good relationships with other technology teams through day to day support and project work
Strong background in machine learning and deep learning algorithms.
Proficiency in Python to developing Gen-AI models.
Ability to design and implement scalable and efficient AI systems.
Skills in data preprocessing and feature engineering for AI model training.
Ability to stay updated with the latest advancements in generative AI research and incorporate them into work.
Expert level knowledge of different OS (AIX, LINUX, WINTEL, Solaris) for BAU support, upgrades & maintenance.
Knowledge on OS Security & hardening.
Knowledge / hands on experience on Patch Management.
In-depth knowledge of LVM, SAN allocation & File System increase, Create new file systems in Cluster / Non-cluster environment.
ESXi, vSphere systems administration and support including vMotion, HA, DRS, vCenter Operations Manager, vCenter Service Manager, vCenter Configuration Manager, Site Recovery Manager.
Administering cloud-based & OpenShift based Infrastructure deployment. Administration tasks includes provisioning/de-provisioning Of resources.
Support audit and Infrastructure / network security scans, Disaster Recovery and security related drills.
Capacity review & performance management across all platform systems.
Knowledge on Middleware components such as JBOSS, APACHE, WebSphere Application server & MQ.
Knowledge on SSL Certificate procurement process & renewals.
Having knowledge on MariaDB, Oracle & DB2 databases Backup, DB restarts, access issues, DB Upgrade support.
Very good understanding of SAN configuration EMC/Hitachi LUNs on UNIX (AIX/Solaris/Linux) servers.
Mange Firewall, GTM & LTM configuration requests.
Ability to develop simple/complex shell scripts as per requirements and for automation.
Effective in dealing with crisis calls / critical issues for business-critical services.
Proven experience in technically guiding teams in productivity driven environment.
Worked in at least two of the areas of IT Infrastructure support i.e. Production Support, Application Support & infrastructure Support.
Explore, learn and deploy new technologies that will help the company to reduce cost or improve operational efficiencies.
Excellent troubleshooting and analytical skills
Communication and interpersonal skills.
Working across cultures & able to work 24*7

Secondary Skills: Unix, CHAOS Engineering, DB / MQ Administration, Network (DNS, Firewall, GTM/LTM, VLAN).

Seniority level

Seniority levelMid-Senior level

Employment type

Employment typeContract

Job function

Job functionInformation Technology
IndustriesIT Services and IT Consulting

Referrals increase your chances of interviewing at HCLTech by 2x

Sign in to set job alerts for "Site Reliability Engineer" roles.Production Engineer / Site Reliability EngineerSite Reliability Engineer (EMEA, Japan, Singapore, Australia)Information Technology - Cloud/DevOps EngineerSite Reliability Engineer (SRE) (GovTech)Engineer (Energy Management Systems Department)Site Reliability Engineer Intern - 2025 Start

Downtown Core, Central Singapore Community Development Council, Singapore 4 weeks ago

Site Reliability Engineer, Engineering Infra - AZ SRE (Campus Recruitment 2026)

We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

Site Reliability Engineer

1 week ago

Singapore Sea Limited Full time

Engineering and Technology - Infrastructure, Singapore - Entry Level Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Site Reliability Engineer, you are responsible for improving the availability and reliability of our Infrastructure services. - Responsible for...
Site Reliability Engineer

1 week ago

Singapore Hyphen Connect Full time

Site Reliability Engineer (Crypto Trading) Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect Site Reliability Engineer (Crypto Trading) 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect We are hiring for one of our ecosystem projects in...
Site Reliability Engineer

4 weeks ago

Singapore Vega Solutions Full time

Join to apply for the Site Reliability Engineer role at Vega Solutions Join to apply for the Site Reliability Engineer role at Vega Solutions Get AI-powered advice on this job and more exclusive features. Tokka Labs | Singapore | Full-TimeTokka Labs is a proprietary trading firm with a focus on close collaboration, rigorous research, and cutting-edge...
Site Reliability Engineer

2 weeks ago

Singapore DHATCH CONSULTANCY PTE. LTD. Full time

Site Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...
Site Reliability Engineer

1 day ago

Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

**Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...
Site Reliability Engineer

3 days ago

Singapore TEAMLEASE DIGITAL CONSULTING PTE. LTD. Full time

As a Site Reliability Engineer, you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault-tolerant and designed to scale. You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime. Key...
Site Reliability Engineer

1 week ago

Singapore HCLTech Full time

Get AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...
Site Reliability Engineer

1 week ago

Singapore Vega Solutions Full time

Join to apply for the Site Reliability Engineer role at Vega SolutionsJoin to apply for the Site Reliability Engineer role at Vega SolutionsGet AI-powered advice on this job and more exclusive features.Tokka Labs | Singapore | Full-TimeTokka Labs is a proprietary trading firm with a focus on close collaboration, rigorous research, and cutting-edge...
Site Reliability Engineer

1 week ago

Singapore Tardis Group Full time

Direct message the job poster from Tardis Group Recruiter at Tardis Group | Finding Top Talent in Tech & Quant About the Company A rapidly growing technology firm operating at the forefront of artificial intelligence and advanced software solutions. The company fosters a fast-paced, collaborative, and innovation-driven culture, uniting talent across...
Site Reliability Engineer

2 days ago

Singapore JJ Consulting Services Full time

Our Client is a fast growing company in Singapore, who is seeking to recruit a Site Reliability Engineer. **Site Reliability Engineer** **Key Roles & Responsibilities** - Providing ancillary support of Enterprise-Grade Products and solutions at customer's sites - Ironing out deployment issues or challenges that our customers may face - Responsible for...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer