Hpc System Engineer

2 days ago


Kallang, Singapore Alpsoft Technologies Pte. Ltd. Full time

We are seeking an experienced and highly skilled **Senior HPC System Engineer** to manage and optimize a complex multi-cluster high-performance computing (HPC) environment. This position will play a key role in the deployment, administration, and support of GPU and hybrid clusters, ensuring high availability, performance, and scalability of HPC systems used for advanced research and computing workloads.

**Key Responsibilities:Cluster & System Administration**
- Manage multiple HPC clusters including Hopper (GPU), Atlas (Hybrid), and Vanda (CPU & GPU).
- Administer on-prem and cloud-based HPC infrastructure using platforms like Dell X9860/H200, AWS HPC6a, and Dell R760 with A40 GPUs.
- Perform OS-level management and patching on RedHat 9.0, CentOS 7/8.
- Maintain 100G Ethernet and Infiniband NDR network configurations for cluster connectivity.

**Performance, Monitoring & Troubleshooting**
- Utilize tools such as Grafana, Ganglia, and iDRAC for system monitoring and performance tuning.
- Implement and troubleshoot workload schedulers such as PBS Professional.
- Analyze performance bottlenecks and ensure systems run at optimal performance.

**Cloud & Hybrid Infrastructure Management**
- Manage hybrid and cloud-native HPC nodes, including AWS-reserved instances integration with PBS Professional.
- Handle cloud connectivity using Site-to-Site VPN.

**Cluster Management & Tools**
- Use Bright Cluster Manager, AWS Dashboard, and related tools for HPC management.
- Deploy and support Kubernetes and Singularity for containerized HPC workloads.

**Software & Module Management**
- Support HPC software stacks using Env Module, EasyBuild, and FlexLM license servers.

**Storage & Filesystem Management**
- Administer parallel file systems like BeeGFS, Dell-EMC Isilon, ECS storage, and others.

**Security & Authentication**
- Enforce user access and authentication via CentrifyDC and LDAP.
- Ensure systems are compliant with internal security policies.

**Collaboration & User Support**
- Provide advanced technical support to researchers using tenant servers.
- Collaborate with cross-functional teams on HPC architecture improvements, upgrades, and roadmap planning.

**Required Qualifications**:

- Bachelor's or Master’s degree in Computer Science, Engineering, or related field.
- 5+ years of experience managing large-scale HPC systems.
- Deep knowledge of Linux-based operating systems (RedHat, CentOS).
- Proven experience with PBS Professional, Bright Cluster Manager, AWS, and Grafana.
- Familiarity with container environments (Kubernetes, Singularity) and storage systems (BeeGFS, Isilon).
- Hands-on experience with GPU clusters and hybrid cloud architecture.
- Strong scripting skills (Shell, Python, etc.).
- Excellent troubleshooting and documentation skills.

**Job Types**: Full-time, Permanent

Pay: $7,000.00 - $9,000.00 per month

Schedule:

- Monday to Friday

Work Location: In person


  • Systems Engineer

    5 days ago


    Kallang, Singapore ANTLABS PTE. LTD. Full time $60,000 - $120,000 per year

    We are seeking a proactive and skilled Systems Engineer to join our Service Delivery team. You will be responsible for the deployment of Linux-based applications, systems and related infrastructure for our projects and customers. This role requires strong technical implementation and troubleshooting skills, a collaborative mindset, and a commitment to...


  • Kallang, Singapore Alerton Australia Full time

    **The Company** Our client are a growing, market leading company, specialising in Building Automation and Energy Management Systems. Currently seeking recent graduates within a relevant engineering field to join our team of engineers based in **Singapore.** If your qualifications were obtained recently, you are passionate about Building Automation and Energy...


  • Kallang, Singapore Arrowcrest Technologies Pte Ltd Full time $60,000 - $120,000 per year

    Key Responsibilities• Manage, maintain, and ensure the stability of the organization's internal network infrastructure.• Design, configure, and implement network setups for new projects and client deployments.• Manage and upkeep system software deployed at client sites, ensuring timely updates and security patches.• Perform software upgrades,...

  • R&D Engineer

    3 days ago


    Kallang, Singapore Omni-Plus System Pte Ltd Full time $60,000 - $120,000 per year

    Job Responsibilities:New Product DevelopmentResearch new product development for business needsIdentify customer's challenges and needs with their technical teamsConduct project feasibility assessment for business needsPerform product development projects scoping activities according to customer requirements and provide estimated budget of projects after...

  • Systems Engineer

    4 days ago


    Kallang, Singapore Jobline Resources Pte Ltd Full time

    **Responsibilities**: - To work closely with project teams and vendors to ensure the successful implementation of new projects and service requests. - To assist DBA in maintenance support for Databases. - To embrace new technologies such as Cloud, containers, etc. for new systems/projects **Requirements**: - Diploma/Degree in IT or equivalent experience...

  • Wintel Engineer

    2 days ago


    Kallang, Singapore Jobline Resources Pte Ltd Full time

    **Responsibilities**: - Implementation and ongoing maintenance, security, and availability of Windows based infrastructure - Troubleshoot and resolve Windows OS related incidents and problems according to customer processes - Perform Root Cause Analysis (RCA) for OS related service problems as part of problem management - Liaise with support vendors for...

  • Software Engineer

    2 days ago


    Kallang, Singapore ACP Computer Training School Pte. Ltd. Full time

    **Jobscope** - Requirements gathering - Design, development and maintenance of the software - Preparation and submission of deliverables throughout the software development lifecycles such as Business Rules, Software Requirement Specifications, Software Architecture Document, Design Specification, Interface Specifications, Source Codes, Testing...


  • Kallang, Singapore Rugged Asia Pte Ltd Full time

    **Responsibilities**: - Install, configure, troubleshoot and support servers or virtual machines (Hyper-V / VMware) running on Windows Server operating systems. - Perform system maintenance, upgrade, patches and incident resolution outside of working hours when required. - Monitoring and implementing backup solutions for critical systems and data...


  • Kallang, Singapore KBR Full time

    **Title**: Material Management Systems Administrator **Job responsibilities**: - Assess the project requirements and risks to configure the internal systems in a way that will deliver success on the project. - Define functional requirements for development and modification of the internal systems to address project specific requirements - Work with the...

  • Systems Engineer

    7 days ago


    Kallang, Singapore ANTLABS PTE. LTD. Full time $60,000 - $120,000 per year

    We are looking for candidate who loves challenges and working in a fast-paced environment utilizing their technical expertise confidently and a passion for troubleshooting and analysis. You will be part of the technical support team that provides the technical support to our customers. You should be a resourceful team player who is self-motivated,...