Global Operations Centre Hpc Engineer

1 week ago


Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time

**ROLES AND RESPONSIBILITIES
The Global Operations Centre (GOC) HPC Engineer is a technical specialist responsible for the daily operations and maintenance of the company's high-performance computing (HPC) environment. The HPC Engineer will be collaborating closely with senior engineers, monitoring system health, troubleshooting issues (especially those related to NVIDIA H-100, Infiniband and Mellanox), assisting the Global Operations Centre and creating clear documentation to ensure smooth and efficient operations.
- Assist in the deployment, configuration, and maintenance of HPC hardware and software components.
- Monitor the health and performance of HPC systems, identifying and resolving issues proactively.
- Participate in on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
- Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
- Analyze system logs, performance data, and user reports to diagnose and resolve issues.
- Document incident details, resolutions, and lessons learned to enhance future problem-solving.
- Create and maintain comprehensive SOPs for common HPC tasks, incident response procedures, and system configurations.
- Ensure documentation is clear, accurate, and up-to-date, contributing to knowledge sharing within the team.
- Communicate effectively with the GOC team, IT stakeholders, and end-users to ensure clear understanding of issues and resolutions.
- Participate in team meetings, project discussions, and knowledge-sharing sessions to foster a collaborative environment.

**SKILLS AND EXPERIENCE**
- Bachelor’s degree in computer science, Engineering, or a related field.
- 8+ years of experience in HPC system administration, Linux/Unix environments, and troubleshooting complex technical problems.
- Strong understanding of HPC architecture, networking, storage, and job scheduling systems.
- In-depth knowledge of Infiniband fabric topology and Mellanox hardware capabilities.
- Proficiency in Linux/Unix operating systems and command-line tools.
- Experience with scripting languages (e.g., Bash, Python) for automation and problem-solving.
- Familiarity with HPC software & administration, and tools (e.g., Slurm, Kubernetes etc).
- Excellent problem-solving and analytical skills.



  • Singapore Quest Global Full time

    HPC System Engineer-Software Join to apply for the HPC System Engineer-Software role at Quest Global 1 week ago Be among the first 25 applicants At Quest Global, it's not just what we do but how and why we do it that makes us different. With over 25 years as an engineering services provider, we believe in the power of doing things differently to make the...

  • System Engineer

    7 days ago


    Singapore NodeFlair Full time

    **Job Summary**: **Salary** S$8,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 5 years **Tech Stacks** C++ Linux C Fujitsu is seeking a High-Performance Computational (HPC) Engineer. This position will participate in the support of our Linux based high-performance computing, storage, and networking environment...


  • Singapore Advanced Micro Devices Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...


  • Singapore A*STAR RESEARCH ENTITIES Full time

    ABOUT THE ROLE Be the bridge between groundbreaking research and Singapore new supercomputer system. As our CFD Domain Specialist, you will be the national-level expert enabling key sectors like advanced manufacturing, aerospace, and urban sustainability. Your work will directly support researchers as they tackle Singapore most complex engineering challenges...


  • Singapore A*STAR Research Full time

    ABOUT THE ROLE Be the bridge between groundbreaking research and Singapore new supercomputer system. As our CFD Domain Specialist, you will be the national-level expert enabling key sectors like advanced manufacturing, aerospace, and urban sustainability. Your work will directly support researchers as they tackle Singapore most complex engineering challenges...


  • Singapore ByteDance Full time

    HPC Network Engineer - Physical Network Infrastructure Responsibilities Responsible for the design, implementation and operation of ByteDance's global high performance computing (HPC) networks. Work with cross-functional teams, including but not limited to machine learning (ML), compute and storage, drives the innovation and evolution of the HPC network....


  • Singapore Agency for Science, Technology and Research (A*STAR) Full time

    Job Summary The HPC Middleware Engineer is responsible for deploying, optimizing, and supporting middleware components in a high-performance computing (HPC) environment. This includes scientific libraries, compilers, runtime environments, and container technologies that bridge system software and user applications. The role supports efficient application...


  • Singapore A*STAR - Agency for Science, Technology and Research Full time

    Join to apply for the HPC Storage Engineer (System), NSCC role at A*STAR - Agency for Science, Technology and Research 2 days ago Be among the first 25 applicants Join to apply for the HPC Storage Engineer (System), NSCC role at A*STAR - Agency for Science, Technology and Research Job Summary: The HPC Storage Engineer will be responsible for managing the...


  • Singapore AMD Full time

    Field Application Engineer - HPC Join to apply for the Field Application Engineer - HPC role at AMD Overview AMD's mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from...


  • Singapore A*STAR Research Full time

    Job Summary The HPC Middleware Engineer is responsible for deploying, optimizing, and supporting middleware components in a high-performance computing (HPC) environment. This includes scientific libraries, compilers, runtime environments, and container technologies that bridge system software and user applications. The role supports efficient application...