HPC System Engineer

3 weeks ago


Singapore A*STAR RESEARCH ENTITIES Full time
Roles & Responsibilities

Job Summary:

The HPC System Engineer will be responsible for managing, monitoring and optimizing the operational of supercomputing system. This role involves collaborating with various research and technical teams to optimize HPC resources utilization. Successful candidate with demonstrated experience in the HPC field may be considered for a Senior position.

Roles and Responsibilities:

System administration and optimization

  • Work with Managed Services teams in managing and administering HPC systems, including servers, storage, and internal network components.
  • Ensure the reliability and availability of HPC infrastructure.
  • Provide support on technical queries and troubleshooting HPC-related problems.
  • Implement best practices for system monitoring and reporting.
  • Develop utility tools to support monitoring, tuning, and troubleshooting activities.
  • Document incident details, resolution, and lessons learned to enhance future problem-solving.
  • Implement security measures and monitoring to protect HPC systems.
  • Conduct regular security check and assessments within HPC system infrastructure.
  • Monitor system performance and optimize the performance through tuning and troubleshooting.

Resource and workload management

  • Monitor HPC resource utilization.
  • Develop and evaluate policies for allocating HPC resources.
  • Optimize job scheduling to maximize resource utilization.

Designing and planning

  • Assess future computational requirements and plan for system expansion.
  • Assist in the designing of future HPC system acquisition.
  • Study and evaluate emerging technologies and trends, including but not limited to:
  • processor and accelerators
  • interconnect technology
  • storage solutions
  • programming models

Qualifications:

  • Degree in a Computer Science, Engineering, IT or other relevant areas.
  • At least 3 years of experience in managing HPC systems.
  • Highly proficient in UNIX/Linux environments and command line interface (CLI).
  • Experience with cluster management software (xCAT, BCM, PHPC, HPCM).
  • Experience with job scheduling and workload management software (Slurm or PBS Pro)
  • Strong knowledge of HPC storage principles and experience in managing parallel file system (Lustre, GPFS, BeeGFS).
  • Strong knowledge of RDMA-based interconnect (InfiniBand, RoCE).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP, etc.
  • Good knowledge of scripting languages like Python, Bash or Perl.
  • Demonstrate ability to analyse complex issues and develop effective solutions.
  • To be considered for Senior position, candidates should have at least 5 years of experience in roles that involve the deployment of HPC systems, covering key areas such as designing, installing, configuring, documentation and providing admin/user training.

Tell employers what skills you have

Computer Engineering
Scalability
Data Management
Unix
Computer Science
Storage
Lustre
analyse product quality
Disaster Recovery
Linux
  • HPC Systems Manager

    5 days ago


    Singapore KLA-TENCOR (SINGAPORE) PTE. LTD. Full time

    Job Title: HPC Systems ManagerAt KLA-TENCOR (SINGAPORE) PTE. LTD., we are seeking a highly skilled HPC Systems Manager to lead our team in developing and optimizing AI-driven solutions. As a key member of our engineering team, you will be responsible for driving innovation, optimizing performance, and fostering collaboration across cross-functional teams.Key...

  • HPC Systems Manager

    3 weeks ago


    Singapore KLA-TENCOR (SINGAPORE) PTE. LTD. Full time

    About the RoleWe are seeking a highly skilled and experienced HPC Systems Manager to join our team at KLA-TENCOR (SINGAPORE) PTE. LTD. as a key member of our AI Systems and High-Performance Computing team.Key ResponsibilitiesTechnical LeadershipLead a team of engineers and system architects to develop the platform for deep learning training and...

  • HPC Systems Manager

    3 weeks ago


    Singapore KLA-TENCOR (SINGAPORE) PTE. LTD. Full time

    Roles & ResponsibilitiesAs an System Design Engineering Manager specializing in AI Systems and High-Performance Computing (HPC), you’ll play a pivotal role in shaping the future of AI-driven solutions. Your leadership will drive innovation, optimize performance, and foster collaboration across cross-functional teams. Let’s delve into the details:Key...

  • HPC Storage Engineer

    3 weeks ago


    Singapore A*STAR RESEARCH ENTITIES Full time

    Roles & ResponsibilitiesJob Summary:The HPC Storage Engineer will be responsible for managing the storage infrastructure within HPC environments. This role involves monitoring storage performance and optimizing through tuning and troubleshooting. Successful candidate with demonstrated experience in the HPC storage field may be considered for a Senior...


  • Singapore HPC AI TECHNOLOGY PTE. LTD. Full time

    AI EngineerHPC AI TECHNOLOGY PTE. LTD. is seeking a highly skilled AI Engineer to join our team. As an AI Engineer, you will be responsible for developing and deploying distributed artificial intelligence systems on large-scale clusters or clouds.Key Responsibilities:Design and implement AI systems using TensorFlow/PyTorch and other frameworks.Optimize...

  • AI System Engineer

    3 weeks ago


    Singapore HPC AI TECHNOLOGY PTE. LTD. Full time

    About the RoleWe are seeking a highly skilled AI System Engineer to join our team at HPC AI TECHNOLOGY PTE. LTD. as a key member of our AI engineering team.Key ResponsibilitiesDevelop and Deploy Distributed AI Systems: Design, develop, and deploy large-scale distributed AI systems on cloud or cluster environments.Algorithm Development and Optimization:...


  • Singapore A*STAR RESEARCH ENTITIES Full time

    Roles & ResponsibilitiesRESPONSIBILITIES Provide HPC and scientific domain advice to on-board new users to NSCC systems. Engage new researchers, communities, and disciplines with computationally intensive requirements. Assist in the design of next NSCC HPC systems, including benchmarking NSCC workloads on various platforms and recommending the most...


  • Singapore PHAIDON INTERNATIONAL (SINGAPORE) PTE. LTD. Full time

    Roles & ResponsibilitiesOur client is a leading company that provides electronics testing, measurement, and optimization solutions for various industries, including telecommunications, aerospace, and automotive.This full-time R&D Software Engineer role is focused on developing advanced software solutions for quantum technologies, including quantum computing,...

  • Software Engineer

    3 weeks ago


    Singapore EVOLUTION RECRUITMENT SOLUTIONS PTE. LTD. Full time

    Roles & ResponsibilitiesAbout the CompanyA leading electronic measurement company, empowering scientists and engineers to tackle their most difficult technical challenges with confidence through innovative wireless, modular, and software solutions.Responsibilities Develop and optimize parallel solvers for quantum control software. Leverage MPI and OpenMP...


  • Singapore KLA-TENCOR (SINGAPORE) PTE. LTD. Full time

    Job Title: High-Performance Compute Systems EngineerWe are seeking a highly skilled High-Performance Compute Systems Engineer to join our team at KLA-TENCOR (SINGAPORE) PTE. LTD.Key Responsibilities:Design and implement high-performance compute clusters, ensuring optimal performance and scalability.Develop and maintain in-depth knowledge of HPC systems,...


  • Singapore RANDSTAD PTE. LIMITED Full time

    Roles & ResponsibilitiesAbout the role Design and enhance parallelized solvers for quantum control software. Leverage MPI and OpenMP to parallelize computational tasks across distributed and shared memory architectures. Implement and fine-tune algorithms for GPU acceleration using CUDA and other GPU computing frameworks. Employ Python and C++ for...


  • Singapore HPC AI TECHNOLOGY PTE. LTD. Full time

    Video Model EngineerHPC AI TECHNOLOGY PTE. LTD. is seeking a highly skilled Video Model Engineer to join our team. As a key member of our AI research team, you will be responsible for designing, implementing, and optimizing text-to-video and image-to-video generation models.Key Responsibilities:Develop and train deep learning models for video generation,...

  • Project Manager

    5 days ago


    Singapore HPC BUILDERS PTE. LTD. Full time

    Job Title: Project EngineerJob SummaryHPC Builders Pte. Ltd. is seeking a highly skilled Project Engineer to join our team. The successful candidate will be responsible for executing company projects in building construction, ensuring timely and smooth progress of works, and maintaining quality of works while complying with safety and environmental...

  • M&E Coordinator

    5 days ago


    Singapore HPC BUILDERS PTE. LTD. Full time

    Job Title: M&E CoordinatorWe are seeking a highly skilled and experienced M&E Coordinator to join our team at HPC Builders Pte. Ltd. The successful candidate will be responsible for overseeing all M&E trades, ensuring timely and smooth progress of works, and ensuring quality of works is achieved and safety & environmental regulations are complied with.Key...

  • M&E Coordinator

    2 weeks ago


    Singapore HPC BUILDERS PTE. LTD. Full time

    Job Title: M&E CoordinatorWe are seeking a highly skilled and experienced Mechanical and Electrical (M&E) Coordinator to join our team at HPC Builders Pte. Ltd.Key Responsibilities:Review project specifications and technical clarifications to ensure accuracy and completenessPrepare method statements and check drawing discrepancies to ensure smooth project...

  • M&E Coordinator

    3 weeks ago


    Singapore HPC BUILDERS PTE. LTD. Full time

    Job SummaryHPC Builders Pte. Ltd. is seeking a highly skilled and experienced Mechanical and Electrical (M&E) Coordinator to oversee all M&E trades and ensure the successful completion of projects.Key ResponsibilitiesProject Planning and CoordinationReview project specifications and technical clarifications to ensure accuracy and completeness.Prepare method...


  • Singapore KLA-TENCOR (SINGAPORE) PTE. LTD. Full time

    Roles & ResponsibilitiesResponsibilities: Support of high-performance compute clusters. Working knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud-based computing architectures. Generate HW BOMs for the HPC Clusters, provide vendor management and oversee HW release...


  • Singapore HPC BUILDERS PTE. LTD. Full time

    Job Title: Site EngineerHPC Builders Pte. Ltd. is seeking a highly skilled Site Engineer to join our team. As a Site Engineer, you will be responsible for ensuring the successful execution of construction projects from start to finish.Key Responsibilities:Monitor and review master construction programs to ensure timely completion and quality of...

  • Video Model Engineer

    3 weeks ago


    Singapore HPC AI TECHNOLOGY PTE. LTD. Full time

    Job SummaryHPC AI TECHNOLOGY PTE. LTD. is seeking a highly skilled Video Model Engineer to join our team. As a key member of our AI research and development team, you will be responsible for designing, implementing, and optimizing text-to-video and image-to-video generation models.Key ResponsibilitiesModel Development: Design and implement cutting-edge video...


  • Singapore HPC BUILDERS PTE. LTD. Full time

    Job Title: Architectural CoordinatorJob Scope:We are seeking a highly skilled Architectural Coordinator to join our team at HPC Builders PTE. LTD. The successful candidate will be responsible for executing company projects in architectural works in building construction.The role involves:Reviewing project specifications, technical clarifications, and...