
Global Operations Centre Hpc Engineer
3 days ago
**ROLES AND RESPONSIBILITIES
The Global Operations Centre (GOC) HPC Engineer is a technical specialist responsible for the daily operations and maintenance of the company's high-performance computing (HPC) environment. The HPC Engineer will be collaborating closely with senior engineers, monitoring system health, troubleshooting issues (especially those related to NVIDIA H-100, Infiniband and Mellanox), assisting the Global Operations Centre and creating clear documentation to ensure smooth and efficient operations.
- Assist in the deployment, configuration, and maintenance of HPC hardware and software components.
- Monitor the health and performance of HPC systems, identifying and resolving issues proactively.
- Participate in on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
- Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
- Analyze system logs, performance data, and user reports to diagnose and resolve issues.
- Document incident details, resolutions, and lessons learned to enhance future problem-solving.
- Create and maintain comprehensive SOPs for common HPC tasks, incident response procedures, and system configurations.
- Ensure documentation is clear, accurate, and up-to-date, contributing to knowledge sharing within the team.
- Communicate effectively with the GOC team, IT stakeholders, and end-users to ensure clear understanding of issues and resolutions.
- Participate in team meetings, project discussions, and knowledge-sharing sessions to foster a collaborative environment.
**SKILLS AND EXPERIENCE**
- Bachelor’s degree in computer science, Engineering, or a related field.
- 8+ years of experience in HPC system administration, Linux/Unix environments, and troubleshooting complex technical problems.
- Strong understanding of HPC architecture, networking, storage, and job scheduling systems.
- In-depth knowledge of Infiniband fabric topology and Mellanox hardware capabilities.
- Proficiency in Linux/Unix operating systems and command-line tools.
- Experience with scripting languages (e.g., Bash, Python) for automation and problem-solving.
- Familiarity with HPC software & administration, and tools (e.g., Slurm, Kubernetes etc).
- Excellent problem-solving and analytical skills.
-
Data Centre Engineer, Field Operations
1 week ago
Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time**ROLES AND RESPONSIBILITIES** Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global...
-
HPC Network Engineer
1 week ago
Singapore BYTEDANCE PTE. LTD. Full time $150,000 - $200,000 per yearHPC Network Engineer - Physical Network InfrastructureSingaporeRegularR&DJob ID: A09131ResponsibilitiesAbout the TeamByteDance Networking brings together innovative ideas and technologies from network architecture, software-defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed networking, to create...
-
System Engineer
19 hours ago
Singapore NodeFlair Full time**Job Summary**: **Salary** S$8,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 5 years **Tech Stacks** C++ Linux C Fujitsu is seeking a High-Performance Computational (HPC) Engineer. This position will participate in the support of our Linux based high-performance computing, storage, and networking environment...
-
Data Centre Engineer, Field Operations
2 weeks ago
Singapore SMC Cloud Full timeOverview Data Centre Engineer, Field Operations role at SMC Cloud — SMC Cloud is seeking a skilled Data Centre Engineer to join the Operations team, supporting the daily operations and maintenance of AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams,...
-
Data Centre Engineer, Field Operations
1 week ago
Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full time $90,000 - $120,000 per yearROLES AND RESPONSIBILITIESFirmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global...
-
System Engineer
1 week ago
Singapore Jobline Resources Pte Ltd Full time $90,000 - $120,000 per yearResponsibilities• Administration and operation of several HPC Linux clusters, storage, networking and associated system and application software. • Understand and work with parallel file systems, HPC cluster management software, and HPC job scheduler software. • Troubleshooting hardware, software, operating systems and networking as necessary. • Work...
-
Data Centre Engineer, Field Operations
1 week ago
Singapore FIRMUS METAL INTERNATIONAL PTE. LTD. Full timeRoles & Responsibilities ROLES AND RESPONSIBILITIES Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure. This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and...
-
HPC System Architect
5 days ago
Singapore beBeeExpert Full time $90,000 - $120,000We are seeking a seasoned expert to lead the administration and operation of our Linux-based High-Performance Computing (HPC) environment.This role involves providing hands-on support for HPC system software, troubleshooting issues across hardware, software, OS, and networking layers, and collaborating with engineers to support AI/deep learning...
-
Hpc Network Engineer
1 week ago
Singapore ByteDance Full time**HPC Network Engineer - Physical Network Infrastructure** - Singapore Regular - R&D Job ID: A09131 **Responsibilities** About the Team ByteDance Networking brings together innovative ideas and technologies from network architecture, software-defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed...
-
Senior HPC System Administrator
2 days ago
Singapore beBeeHighPerformance Full time $120,000 - $140,000Job OverviewWe are seeking a highly experienced and driven High-Performance Computing (HPC) professional to support our Linux-based HPC environment.This is an exceptional opportunity for individuals with a passion for HPC system administration, user support, and collaboration with software engineers to support AI/deep learning applications and desktop...