
High-Performance Computing Senior Engineer
1 day ago
JOB DESCRIPTION
DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future of defence across the spectrum of air, land, sea, space and cyberspace.
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO's greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Senior Engineer
In this role, you will:
Ensure the reliable operations of the central GPU Clusters use for AI training and High Performance Computing (HPC) Clusters
Advise Users on workload execution and optimization strategies
Provide Users support for resources they need
Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g. head nodes)
JOB REQUIREMENTS
Degree in Computer Engineering / Computer Science
Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
Experience with cluster management software (e.g., BCM)
Proficient in Python and Bash scripting for automation tasks
Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
Understanding of basic network protocols (e.g., DHCP, DNS, SSH, SCP, SMTP).
Proficient in UNIX/Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat
Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet
Good knowledge and experience in HPC performance optimization and troubleshooting
Proven working knowledge of HPC system and software
Strong programming skill in Python and Bash scripting
Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
SKILLS
PARALLEL COMPUTINGDISTRIBUTED SYSTEMSCLUSTER MANAGEMENT
JOB ID
:
EXPERIENCE
:
5 ~ 10 years
DIVISION
DIGITAL
TYPE
PERMANENT
DIVISION
DIGITAL
FIELD
SOFTWARE DEVELOPMENT
#J-18808-Ljbffr
-
High-Performance Computing Senior Engineer
1 week ago
Singapore DSO National Laboratories Full timeJOB DESCRIPTION DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future...
-
High-Performance Computing Engineer
5 days ago
Singapore DSO National Laboratories Full timeJOB DESCRIPTION DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future...
-
Senior Sre
1 week ago
Singapore Oxford Knight Full timeSenior SRE (High Performance Computing) | Singapore or Hong Kong **Salary**: up to 250-275k SGD base **Summary** High-frequency prop trading firm with offices worldwide looking for skilled Senior Site Reliability Engineer developer to join their High Performance Computing team, developing and supporting their large-scale compute and storage...
-
High-Performance Computing Engineer
2 days ago
Singapore beBeeDevelopment Full time $80,000 - $120,000AI Developer We are seeking a highly skilled AI developer to join our team. The ideal candidate will have expertise in developing high-performance simulation models and optimization strategies for NPU HW architecture. Responsibilities: Develop performance analysis models to identify hardware bottlenecks and propose architectural improvements. Build...
-
High-Performance Computing
4 days ago
Singapore GOLDTECH RESOURCES PTE LTD Full timeRoles & ResponsibilitiesSummary:We are seeking a highly experienced and driven High-Performance Computing (HPC) Engineer or Scientist to support our Linux-based HPC environment which includes compute clusters, parallel storage and high-speed networking used by researchers, staff and students. This role also involves customer-facing responsibilities including...
-
High-Performance Computing Specialist
3 days ago
Singapore beBeeHighPerformanceComputing Full time $120,000 - $180,000High-Performance Computing ExpertWe are seeking a highly experienced and driven Expert in High-Performance Computing to support our Linux-based HPC environment.Key Responsibilities:Lead the administration and operation of HPC Linux clusters, storage systems, and high-speed networks.Provide hands-on support for HPC system software, including cluster...
-
High-Performance Computing Specialist
9 hours ago
Singapore beBeeSystem Full time $80,000 - $120,000Job Title: Infrastructure EngineerWe are seeking a skilled professional to join our team as an Infrastructure Engineer. As part of this role, you will play a key part in designing and developing high-performance computing frameworks and storage systems. You will also be responsible for building and maintaining online services for our recommendation system,...
-
Senior Sre
1 week ago
Singapore Oxford Knight Full time**Salary**: up to 250-275k SGD base **Summary** High-frequency prop trading firm with offices worldwide looking for skilled Senior Site Reliability Engineer developer to join their High Performance Computing team, developing and supporting their large-scale compute and storage platform. This platform is designed to solve demanding problems - both business...
-
High-Performance Computing Expert
17 hours ago
Singapore beBeeSoftwareDevelopment Full time $90,000 - $120,000About UsWe are a leading organization in the field of defense research and development, committed to developing technological solutions to enhance national security. In this role, you will have the opportunity to make a real impact and shape the future of our organization across various domains. Our team is passionate about delivering digital capabilities...
-
Singapore beBeeComputing Full timeJob Summary:We are seeking a High-Performance Computing Engineer to join our team. As a key member of our organization, you will be responsible for ensuring the reliable operations of central GPU Clusters used for AI training and High-Performance Computing (HPC) Clusters. You will also advise users on workload execution and optimization strategies, provide...