Global Operations Centre Hpc Engineer

3 days ago


Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time

**ROLES AND RESPONSIBILITIES
The Global Operations Centre (GOC) HPC Engineer is a technical specialist responsible for the daily operations and maintenance of the company's high-performance computing (HPC) environment. The HPC Engineer will be collaborating closely with senior engineers, monitoring system health, troubleshooting issues (especially those related to NVIDIA H-100, Infiniband and Mellanox), assisting the Global Operations Centre and creating clear documentation to ensure smooth and efficient operations.
- Assist in the deployment, configuration, and maintenance of HPC hardware and software components.
- Monitor the health and performance of HPC systems, identifying and resolving issues proactively.
- Participate in on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
- Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
- Analyze system logs, performance data, and user reports to diagnose and resolve issues.
- Document incident details, resolutions, and lessons learned to enhance future problem-solving.
- Create and maintain comprehensive SOPs for common HPC tasks, incident response procedures, and system configurations.
- Ensure documentation is clear, accurate, and up-to-date, contributing to knowledge sharing within the team.
- Communicate effectively with the GOC team, IT stakeholders, and end-users to ensure clear understanding of issues and resolutions.
- Participate in team meetings, project discussions, and knowledge-sharing sessions to foster a collaborative environment.

**SKILLS AND EXPERIENCE**
- Bachelor’s degree in computer science, Engineering, or a related field.
- 8+ years of experience in HPC system administration, Linux/Unix environments, and troubleshooting complex technical problems.
- Strong understanding of HPC architecture, networking, storage, and job scheduling systems.
- In-depth knowledge of Infiniband fabric topology and Mellanox hardware capabilities.
- Proficiency in Linux/Unix operating systems and command-line tools.
- Experience with scripting languages (e.g., Bash, Python) for automation and problem-solving.
- Familiarity with HPC software & administration, and tools (e.g., Slurm, Kubernetes etc).
- Excellent problem-solving and analytical skills.



  • Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time

    **ROLES AND RESPONSIBILITIES** Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global...


  • Singapore BYTEDANCE PTE. LTD. Full time $150,000 - $200,000 per year

    HPC Network Engineer - Physical Network InfrastructureSingaporeRegularR&DJob ID: A09131ResponsibilitiesAbout the TeamByteDance Networking brings together innovative ideas and technologies from network architecture, software-defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed networking, to create...

  • System Engineer

    19 hours ago


    Singapore NodeFlair Full time

    **Job Summary**: **Salary** S$8,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 5 years **Tech Stacks** C++ Linux C Fujitsu is seeking a High-Performance Computational (HPC) Engineer. This position will participate in the support of our Linux based high-performance computing, storage, and networking environment...


  • Singapore SMC Cloud Full time

    Overview Data Centre Engineer, Field Operations role at SMC Cloud — SMC Cloud is seeking a skilled Data Centre Engineer to join the Operations team, supporting the daily operations and maintenance of AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams,...


  • Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time $90,000 - $120,000 per year

    ROLES AND RESPONSIBILITIESFirmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global...

  • System Engineer

    1 week ago


    Singapore Jobline Resources Pte Ltd Full time $90,000 - $120,000 per year

    Responsibilities• Administration and operation of several HPC Linux clusters, storage, networking and associated system and application software. • Understand and work with parallel file systems, HPC cluster management software, and HPC job scheduler software. • Troubleshooting hardware, software, operating systems and networking as necessary. • Work...


  • Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time

    Roles & Responsibilities ROLES AND RESPONSIBILITIES Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and...


  • Singapore beBeeExpert Full time $90,000 - $120,000

    We are seeking a seasoned expert to lead the administration and operation of our Linux-based High-Performance Computing (HPC) environment.This role involves providing hands-on support for HPC system software, troubleshooting issues across hardware, software, OS, and networking layers, and collaborating with engineers to support AI/deep learning...


  • Singapore ByteDance Full time

    **HPC Network Engineer - Physical Network Infrastructure** - Singapore Regular - R&D Job ID: A09131 **Responsibilities** About the Team ByteDance Networking brings together innovative ideas and technologies from network architecture, software-defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed...


  • Singapore beBeeHighPerformance Full time $120,000 - $140,000

    Job OverviewWe are seeking a highly experienced and driven High-Performance Computing (HPC) professional to support our Linux-based HPC environment.This is an exceptional opportunity for individuals with a passion for HPC system administration, user support, and collaboration with software engineers to support AI/deep learning applications and desktop...