Site Reliability Engineer

15 hours ago


Singapore EXASOFT CONSULTING PTE. LTD. Full time

Responsibilities

  • Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and minimal operational risk.
  • Leverage deep Linux kernel expertise to fine-tune scheduling policies, interrupt routing, and NUMA resource allocation, ensuring predictable performance at scale.
  • Build and maintain high-availability containerized environments using Kubernetes, Docker, and advanced orchestration tools with a strong focus on scalability and security.
  • Lead automation initiatives with Ansible, Bash, and Python, eliminating manual intervention and improving system efficiency.
  • Manage hybrid cloud infrastructure (AWS, Azure, GCP) with strict performance SLAs, security compliance, and cost-optimized deployments.
  • Oversee infrastructure monitoring and observability using ELK Stack, Grafana, Site24x7, Splunk, and other enterprise-grade tools, ensuring proactive incident detection and resolution.
  • Administer and troubleshoot enterprise storage and networking stacks like RAID, NFS, SAN/NAS, TCP/IP networking,VMware/vCenter, BigIP load balancers.
  • Collaborate with development, DevOps, and security teams to design fault-tolerant systems and enforce infrastructure governance policies.
  • Execute predictive capacity modeling, OS hardening and patch compliance, coupled with benchmark-driven performance optimization for trading and real-time compute platforms.
  • Provide expert-level outage resolution, coordinating cross-functional teams to deliver sustainable remediation and operational resilience.
Requirements
  • 10+ years of progressive experience in system administration, performance engineering, and reliability operations across enterprise and financial domains.
  • Advanced proficiency in Linux internals with specialization in kernel performance tuning, NUMA-aware optimizations, and real-time workload handling.
  • Proven hands-on experience with Kubernetes, Docker, and Ansible for large-scale automation and orchestration.
  • Strong scripting/programming in Bash, Python, and experience with perf/eBPF for system analysis.
  • Demonstrated expertise in cloud operations across AWS, Azure, and GCP.
  • Strong background in networking protocols (TCP/IP, FIX) and high-performance trading environments.
  • Familiarity with storage systems (SAN, NAS, RAID) and database tuning (MySQL optimization).
  • Experience implementing observability and monitoring solutions like ELK, Grafana, Splunk, Corvil.
#J-18808-Ljbffr

  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. PurposeThis role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore DHATCH CONSULTANCY PTE. LTD. Full time

    Site Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...


  • Singapore JJ Consulting Services Full time

    Our Client is a fast growing company in Singapore, who is seeking to recruit a Site Reliability Engineer. **Site Reliability Engineer** **Key Roles & Responsibilities** - Providing ancillary support of Enterprise-Grade Products and solutions at customer's sites - Ironing out deployment issues or challenges that our customers may face - Responsible for...


  • Singapore Beijing Foreign Enterprise Management Consultants Co.,Ltd. Full time

    Direct message the job poster from Beijing Foreign Enterprise Management Consultants Co.,Ltd. On behalf of Huawei, a world-renowned information and communication technology company, we are seeking passionate and talented individuals to join our team as Site Reliability Engineer Overview On behalf of Huawei, a world-renowned information and communication...


  • Singapore Manpower Singapore Full time

    Site Reliability Engineer Assistant (DevOps) Site Reliability Engineer Assistant (DevOps) This range is provided by Manpower Singapore. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range Responsible for the operation and maintenance of online game marketing services, to ensure the continuous...


  • Singapore The Edge Asia Full time

    Our client is a US hedge fund and their Technology group is constantly improving the company’s IT infrastructure, positioning them at the forefront of a rapidly evolving technology landscape. They are a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. Their...


  • Singapore People Profilers Full time

    Job Description: **Responsibilities**: - Support services before they go live through activities such as system design consulting and launch reviews. - Develop and maintain tools, re-designing capacity planning infrastructure for greater scalability. - Troubleshooting, diagnosing and fixing software issues. - Suggesting architecture improvements, pushing...


  • Singapore ABAXX SINGAPORE PTE. LTD. Full time

    Site Reliability Engineer - Networking We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house. This role is ideal for someone with a strong foundation in AWS services, infrastructure as code, and cloud security, who is passionate about building scalable, secure,...


  • Singapore Point72 Full time

    Join to apply for the Site Reliability Engineer role at Point72 About the role As part of Point72's Technology Team, you will focus on developing and maintaining complex, distributed, real-time systems that support our Global Macro business. Your responsibilities will include optimizing operations through automation, building foundational SRE components,...