Staff Platform Engineer

2 days ago


Singapore Centre for Strategic Infocomm Technologies (CSIT) Full time

You will be part of the dynamic team responsible for building resilient network infrastructure using cutting-edge technologies such as cloud-based and software-defined networking e.g. SD-WAN, ACI and NSX. You must have a good understanding of IT infrastructure systems, and knowledge in the latest networking technologies and platforms. You will be a technical specialist in a team, and must be keen to take on new challenges and keep abreast with rapidly evolving technology landscape.

**Role**:
**Responsibilities**:
- Lead a team to deliver resilient, scalable and secure HPC platform, including compute nodes, storage systems, networks and job scheduling systems.
- Lead, design, implement and manage the HPC infrastructure platform to meet organisational needs.
- Design and implement storage solutions for HPC workloads to ensure efficient data storage and retrieval.
- Design and implement high-performance networking solutions, including InfiniBand, Ethernet, and other interconnects.
- Plan and manage HPC resource capacity, including forecasting, procurement and deployment of new hardware and software.
- Manage HPC clusters, including optimizing, monitoring and troubleshooting cluster performance, as well as managing job scheduling and resource allocation.
- Ensure the security and compliance of the HPC infrastructure platform, including managing access controls, implementing security patches, and conducting regular security checks.

**Requirements (Minimum Qualifications)**:
- Bachelor's degree in Computer Science, Computer Engineering, or a related field.
- 8+ years of experience in managing HPC systems, including experience with Linux, Unix, or other operating systems.
- Strong knowledge of HPC architectures, including clusters, grids, and clouds.
- Experience with HPC job scheduling systems, such as Slurm, Torque and LSF.
- Strong understanding of storage systems, including SANs, NAS, and object storage.
- Experience with high-performance networking, including InfiniBand, Ethernet, and other interconnects.
- Experience with cloud computing platforms, such as AWS, Azure, or Google Cloud.
- Experience with scripting languages, such as Python, Perl, or Bash.
- Experience with containerization (Docker, Kubernetes) and proficient in a range of complementary technologies, including Knative, Run:AI, Grafana, Prometheus, Kyverno, ArgoCD, Rancher, NVIDIA BCM and knowledge of NVIDIA Superpod architecture.
- Experience in leading engineering teams.

**Nice to Have**:
- Certifications in NVIDIA AI Infrastructure and Operations, and Certified Kubernetes Administrator.
- Experience with machine learning or deep learning frameworks, such as TensorFlow or PyTorch.
- Familiarity with agile development methodologies and version control systems, such as Git.

**Why join us?**:
- The work is purposeful and meaningful
- You will work with the best engineers
- We work with modern technologies and tech stacks
- We have excellent engineering culture and work-life balance
- We aspire to engineering and operational excellence
- We empower to innovate
- We grow together as a family
- As CSIT is an agency under the Ministry of Defence (Singapore), only Singapore Citizens will be considered._



  • Singapore Grasshopper Pte Ltd Full time

    **What We Are Looking For**: As a Staff Engineer on the Infrastructure Team, you will play a large role in advancing our research and batch computing capabilities. You will work closely with cross-functional teams to architect, develop and maintain scalable solutions on our Google Cloud and our on-premise Infrastructure. **Responsibilities**: **As a Staff...


  • Singapore Delivery Hero Full time

    About the opportunity We are looking for a Staff Platform Engineer with a particular focus on Databases to join our Developer Platform team. Working closely with the Data Engineering team and with other Staff and Principal engineers in the domain, you will be responsible for collaborating on design, development and runtime of automation solutions for all...


  • Singapore foodpanda Full time

    **Company Description** “To be the most loved everyday food and groceries destination!” - that’s our mission at foodpanda (small ‘f’). foodpanda is the largest food and grocery delivery platform in Asia, outside of China. Operating in more than 400 cities across 11 markets, we continue to expand and grow in our core food delivery business, as well...

  • Staff Platform

    6 hours ago


    Singapore Centre for Strategic Infocomm Technologies Full time

    You will be leading the design, development, integration, and optimizing enterprise-grade communication and collaboration platforms. Drive platform architecture, software and security engineering practices, and site reliability engineering (SRE) to ensure secure, scalable, and optimized systems. Champion modern engineering methodologies and contribute to...

  • Staff Platform

    2 weeks ago


    Singapore Centre for Strategic Infocomm Technologies Full time $150,000 - $200,000 per year

    You will be leading the design, development, integration, and optimizing enterprise-grade communication and collaboration platforms. Drive platform architecture, software and security engineering practices, and site reliability engineering (SRE) to ensure secure, scalable, and optimized systems. Champion modern engineering methodologies and contribute to...

  • Staff Platform

    2 weeks ago


    Singapore Centre for Strategic Infocomm Technologies Full time $150,000 - $200,000 per year

    You will be leading the design, development, integration, and operations of digital workplace platforms and end-user technologies. Drive platform architecture, software and security engineering practices, and site reliability engineering (SRE) to ensure secure, scalable, and optimized systems. Champion modern engineering methodologies and contribute to...


  • Singapore NodeFlair Full time

    **Job Summary**: **Job Type** Permanent **Seniority** **Years of Experience** Information not provided **Tech Stacks** Core Data AWS Analytics RedShift Airflow kafka Ruby SQL PostgreSQL Python - We're searching for a Staff Software Engineer to drive the engineering vision of our Data Platform. In this pivotal role, you'll design, build, and operate the...


  • Singapore Airwallex Full time

    Staff DevOps Engineer, Issuing Platform Airwallex is the leading financial technology platform for modern businesses growing beyond borders. With one of the world's most powerful payments infrastructure, our technology empowers businesses of all sizes to accept payments, move money globally, and simplify their financial operations, all in one single...

  • Zendesk Engineer

    2 weeks ago


    Singapore Get Staff Full time

    About us The Role Our client is a Fortune 100 technology company, providing platforms that help connect people around the world. One of our core values is to scale the business by putting people first, and the Enterprise Products team is uniquely positioned to propel this work to the next level as we let the customer’s needs be our guiding compass, not...


  • Singapore Airwallex Full time

    **About Airwallex** Airwallex is the only unified payments and financial platform for global businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 150,000 businesses worldwide - including Brex, Rippling, Navan, Qantas, SHEIN and many more - with fully integrated solutions to manage everything from business...