Global Operations Centre Hpc Engineer
1 week ago
**ROLES AND RESPONSIBILITIES
The Global Operations Centre (GOC) HPC Engineer is a technical specialist responsible for the daily operations and maintenance of the company's high-performance computing (HPC) environment. The HPC Engineer will be collaborating closely with senior engineers, monitoring system health, troubleshooting issues (especially those related to NVIDIA H-100, Infiniband and Mellanox), assisting the Global Operations Centre and creating clear documentation to ensure smooth and efficient operations.
- Assist in the deployment, configuration, and maintenance of HPC hardware and software components.
- Monitor the health and performance of HPC systems, identifying and resolving issues proactively.
- Participate in on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
- Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
- Analyze system logs, performance data, and user reports to diagnose and resolve issues.
- Document incident details, resolutions, and lessons learned to enhance future problem-solving.
- Create and maintain comprehensive SOPs for common HPC tasks, incident response procedures, and system configurations.
- Ensure documentation is clear, accurate, and up-to-date, contributing to knowledge sharing within the team.
- Communicate effectively with the GOC team, IT stakeholders, and end-users to ensure clear understanding of issues and resolutions.
- Participate in team meetings, project discussions, and knowledge-sharing sessions to foster a collaborative environment.
**SKILLS AND EXPERIENCE**
- Bachelor’s degree in computer science, Engineering, or a related field.
- 8+ years of experience in HPC system administration, Linux/Unix environments, and troubleshooting complex technical problems.
- Strong understanding of HPC architecture, networking, storage, and job scheduling systems.
- In-depth knowledge of Infiniband fabric topology and Mellanox hardware capabilities.
- Proficiency in Linux/Unix operating systems and command-line tools.
- Experience with scripting languages (e.g., Bash, Python) for automation and problem-solving.
- Familiarity with HPC software & administration, and tools (e.g., Slurm, Kubernetes etc).
- Excellent problem-solving and analytical skills.
-
HPC System Engineer-Software
4 days ago
Singapore Quest Global Full timeHPC System Engineer-Software Join to apply for the HPC System Engineer-Software role at Quest Global 1 week ago Be among the first 25 applicants At Quest Global, it's not just what we do but how and why we do it that makes us different. With over 25 years as an engineering services provider, we believe in the power of doing things differently to make the...
-
System Engineer
7 days ago
Singapore NodeFlair Full time**Job Summary**: **Salary** S$8,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 5 years **Tech Stacks** C++ Linux C Fujitsu is seeking a High-Performance Computational (HPC) Engineer. This position will participate in the support of our Linux based high-performance computing, storage, and networking environment...
-
Field Application Engineer
4 days ago
Singapore Advanced Micro Devices Full timeWHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...
-
HPC Domain Specialist
2 days ago
Singapore A*STAR RESEARCH ENTITIES Full timeABOUT THE ROLE Be the bridge between groundbreaking research and Singapore new supercomputer system. As our CFD Domain Specialist, you will be the national-level expert enabling key sectors like advanced manufacturing, aerospace, and urban sustainability. Your work will directly support researchers as they tackle Singapore most complex engineering challenges...
-
HPC Domain Specialist
2 days ago
Singapore A*STAR Research Full timeABOUT THE ROLE Be the bridge between groundbreaking research and Singapore new supercomputer system. As our CFD Domain Specialist, you will be the national-level expert enabling key sectors like advanced manufacturing, aerospace, and urban sustainability. Your work will directly support researchers as they tackle Singapore most complex engineering challenges...
-
HPC Network Engineer
2 days ago
Singapore ByteDance Full timeHPC Network Engineer - Physical Network Infrastructure Responsibilities Responsible for the design, implementation and operation of ByteDance's global high performance computing (HPC) networks. Work with cross-functional teams, including but not limited to machine learning (ML), compute and storage, drives the innovation and evolution of the HPC network....
-
HPC Middleware Engineer, System, NSCC
4 days ago
Singapore Agency for Science, Technology and Research (A*STAR) Full timeJob Summary The HPC Middleware Engineer is responsible for deploying, optimizing, and supporting middleware components in a high-performance computing (HPC) environment. This includes scientific libraries, compilers, runtime environments, and container technologies that bridge system software and user applications. The role supports efficient application...
-
HPC Storage Engineer
6 days ago
Singapore A*STAR - Agency for Science, Technology and Research Full timeJoin to apply for the HPC Storage Engineer (System), NSCC role at A*STAR - Agency for Science, Technology and Research 2 days ago Be among the first 25 applicants Join to apply for the HPC Storage Engineer (System), NSCC role at A*STAR - Agency for Science, Technology and Research Job Summary: The HPC Storage Engineer will be responsible for managing the...
-
Field Application Engineer
6 days ago
Singapore AMD Full timeField Application Engineer - HPC Join to apply for the Field Application Engineer - HPC role at AMD Overview AMD's mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from...
-
HPC Middleware Engineer, System, NSCC
4 days ago
Singapore A*STAR Research Full timeJob Summary The HPC Middleware Engineer is responsible for deploying, optimizing, and supporting middleware components in a high-performance computing (HPC) environment. This includes scientific libraries, compilers, runtime environments, and container technologies that bridge system software and user applications. The role supports efficient application...