High-Performance Computing Engineer

5 days ago


Singapore DSO National Laboratories Full time

JOB DESCRIPTION
DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future of defence across the spectrum of air, land, sea, space and cyberspace.
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO's greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Engineer (3 Year Contract)
In this role, you will:
Ensure the reliable operations of the central GPU Clusters use for AI training and High-Performance Computing (HPC) Clusters
Advise Users on workload execution and optimization strategies
Provide Users support for resources they need
Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g., head nodes)
JOB REQUIREMENTS
Degree in Computer Engineering/ Computer Science/ Electrical & Electronic Engineering
Proficient in UNI/ Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat)
Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet)
Good knowledge and experience in HPC performance optimization and troubleshooting
Proven working knowledge of HPC system and software
Strong programming skill in Python and Bash scripting
Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
Experience with cluster management software (e.g., BCM)
Proficient in Python and Bash scripting for automation tasks
Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
SKILLS
PARALLEL COMPUTINGDISTRIBUTED SYSTEMSCLUSTER MANAGEMENT
JOB ID
:
EXPERIENCE
:
2 ~ 4 years
#J-18808-Ljbffr



  • Singapore beBeeDevelopment Full time $80,000 - $120,000

    AI Developer We are seeking a highly skilled AI developer to join our team. The ideal candidate will have expertise in developing high-performance simulation models and optimization strategies for NPU HW architecture. Responsibilities: Develop performance analysis models to identify hardware bottlenecks and propose architectural improvements. Build...


  • Singapore GOLDTECH RESOURCES PTE LTD Full time

    Roles & ResponsibilitiesSummary:We are seeking a highly experienced and driven High-Performance Computing (HPC) Engineer or Scientist to support our Linux-based HPC environment which includes compute clusters, parallel storage and high-speed networking used by researchers, staff and students. This role also involves customer-facing responsibilities including...


  • Singapore beBeeInfrastructure Full time $80,000 - $150,000

    Cloud Infrastructure Engineer Job DescriptionAs a Cloud Infrastructure Engineer, you will be responsible for designing, deploying and maintaining scalable cloud infrastructure for high-performance computing applications. This includes managing container orchestration platforms, automating provisioning of compute resources, and monitoring infrastructure...


  • Singapore beBeeHighPerformanceComputing Full time $120,000 - $180,000

    High-Performance Computing ExpertWe are seeking a highly experienced and driven Expert in High-Performance Computing to support our Linux-based HPC environment.Key Responsibilities:Lead the administration and operation of HPC Linux clusters, storage systems, and high-speed networks.Provide hands-on support for HPC system software, including cluster...


  • Singapore beBeeSystem Full time $80,000 - $120,000

    Job Title: Infrastructure EngineerWe are seeking a skilled professional to join our team as an Infrastructure Engineer. As part of this role, you will play a key part in designing and developing high-performance computing frameworks and storage systems. You will also be responsible for building and maintaining online services for our recommendation system,...


  • Singapore DSO National Laboratories Full time

    JOB DESCRIPTION DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future...


  • Singapore DSO National Laboratories Full time

    JOB DESCRIPTION DSO National Laboratories (DSO) is Singapore's largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future...


  • Singapore beBeeSoftwareDevelopment Full time $90,000 - $120,000

    About UsWe are a leading organization in the field of defense research and development, committed to developing technological solutions to enhance national security. In this role, you will have the opportunity to make a real impact and shape the future of our organization across various domains. Our team is passionate about delivering digital capabilities...


  • Singapore beBeePerformance Full time $120,000 - $150,000

    Job Overview:">We are seeking a skilled CPU Algorithm Optimization Engineer to join our team. As a member of our software development team, you will be responsible for enhancing the execution efficiency and performance of software applications on central processing units.">By optimizing algorithms and refining code implementation, you will ensure that our...


  • Singapore beBeeDataCentreEngineer Full time $1,200,000 - $1,400,000

    Experience the thrill of working with cutting-edge AI technology in a fast-paced, dynamic environment. We are seeking a skilled Data Centre Engineer to join our team, supporting the daily operations and maintenance of our high-performance computing infrastructure.This role requires strong technical expertise, excellent problem-solving skills, and the ability...