Site Reliability Engineer

1 week ago


Singapore EXASOFT CONSULTING PTE. LTD. Full time

**Responsibilities**
- Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and mínimal operational risk.
- Leverage deep Linux kernel expertise to fine-tune scheduling policies, interrupt routing, and NUMA resource allocation, ensuring predictable performance at scale.
- Build and maintain high-availability containerized environments using Kubernetes, Docker, and advanced orchestration tools with a strong focus on scalability and security.
- Lead automation initiatives with Ansible, Bash, and Python, eliminating manual intervention and improving system efficiency.
- Manage hybrid cloud infrastructure (AWS, Azure, GCP) with strict performance SLAs, security compliance, and cost-optimized deployments.
- Oversee infrastructure monitoring and observability using ELK Stack, Grafana, Site24x7, Splunk, and other enterprise-grade tools, ensuring proactive incident detection and resolution.
- Administer and troubleshoot enterprise storage and networking stacks like RAID, NFS, SAN/NAS, TCP/IP networking,VMware/vCenter, BigIP load balancers.
- Collaborate with development, DevOps, and security teams to design fault-tolerant systems and enforce infrastructure governance policies.
- Execute predictive capacity modeling, OS hardening and patch compliance, coupled with benchmark-driven performance optimization for trading and real-time compute platforms.
- Provide expert-level outage resolution, coordinating cross-functional teams to deliver sustainable remediation and operational resilience.

**Requirements**:

- 10+ years of progressive experience in system administration, performance engineering, and reliability operations across enterprise and financial domains.
- Advanced proficiency in Linux internals with specialization in kernel performance tuning, NUMA-aware optimizations, and real-time workload handling.
- Proven hands-on experience with Kubernetes, Docker, and Ansible for large-scale automation and orchestration.
- Strong scripting/programming in Bash, Python, and experience with perf/eBPF for system analysis.
- Demonstrated expertise in cloud operations across AWS, Azure, and GCP.
- Strong background in networking protocols (TCP/IP, FIX) and high-performance trading environments.
- Familiarity with storage systems (SAN, NAS, RAID) and database tuning (MySQL optimization).
- Experience implementing observability and monitoring solutions like ELK, Grafana, Splunk, Corvil.



  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. PurposeThis role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. PurposeThis role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. Purpose This role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore beBeeSiteReliability Full time $90,000 - $120,000

    Unlock Your Full Potential in Site Reliability EngineeringAbout the RoleThis is an exciting opportunity to work with a global banking institution, leveraging your skills in production management and site reliability engineering to drive business growth.Develop and implement proactive, predictive models for shift production management using SRE...


  • Singapore beBeeSiteReliability Full time

    Unlock Your Full Potential in Site Reliability Engineering About the Role This is an exciting opportunity to work with a global banking institution, leveraging your skills in production management and site reliability engineering to drive business growth. Develop and implement proactive, predictive models for shift production management using SRE...


  • Singapore HCLTech Full time

    Get AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...


  • Singapore HCLTech Full time

    Get AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...


  • Singapore DHATCH CONSULTANCY PTE. LTD. Full time

    Site Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...


  • Singapore Tardis Group Full time

    Direct message the job poster from Tardis Group Recruiter at Tardis Group | Finding Top Talent in Tech & Quant About the Company A rapidly growing technology firm operating at the forefront of artificial intelligence and advanced software solutions. The company fosters a fast-paced, collaborative, and innovation-driven culture, uniting talent across...


  • Singapore ByteDance Full time

    Site Reliability Engineer - Privacy & Security - Singapore Site Reliability Engineer - Privacy & Security - Singapore 4 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. ResponsibilitiesFounded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen...