Lead AI Infrastructure Engineer

4 days ago


Singapur, Singapore Thoughtworks Inc. Full time

Thoughtworks Singapore will be shortlisting applicants who have a current right to work in Singapore i.e. Singapore Citizens and Singapore Permanent Residents only.

At Thoughtworks, Lead AI Infrastructure Engineers design and maintain high-performance, scalable, and resilient infrastructure for modern AI workloads. You’ll focus on enabling advanced inference systems, including LLMs, VLMs, and SLMs, across on-premises GPU clusters and cloud environments. This role is critical to ensuring our clients’ AI systems achieve demanding requirements for throughput, latency, availability, and compliance.

As a senior technical leader, you will partner with ML engineers, platform engineers, AI researchers, and client stakeholders to deliver optimized infrastructure that is both robust and future-proof. You will combine deep expertise in GPU-based inference infrastructure with a broader understanding of DevOps, agile delivery, and platform engineering to drive impactful AI solutions at enterprise scale.

Job responsibilities
  • Design and operate GPU-based infrastructure (e.g., NVIDIA GB200, H100) across cloud and self-hosted environments.
  • Architect scalable inference platforms that support real-time and batch serving with high availability, load balancing, and fault tolerance.
  • Integrate inference workloads with orchestration frameworks such as Kubernetes, Slurm, and Ray, as well as observability stacks like Prometheus, Grafana, and OpenTelemetry.
  • Automate infrastructure provisioning and deployment using Terraform, Helm, and CI/CD pipelines.
  • Collaborate with ML engineers to co-design systems optimized for low-latency serving, continuous batching, and advanced inference optimization techniques (quantization, distillation, pruning, KV caching).
  • Lead client engagements by shaping technical roadmaps that align AI infrastructure with business objectives, ensuring compliance, scalability, and performance.
  • Champion DevOps and agile practices to accelerate delivery while maintaining reliability, quality, and resilience.
  • Mentor and guide teams in best practices for AI infrastructure engineering, fostering a culture of technical excellence and innovation.
Job qualifications Technical Skills
  • Expertise in GPU-based infrastructure for AI (H100, GB200, or similar), including scaling across clusters.
  • Strong knowledge of orchestration frameworks: Kubernetes, Ray, Slurm.
  • Experience with inference-serving frameworks (vLLM, NVIDIA Triton, DeepSpeed).
  • Proficiency in infrastructure automation (Terraform, Helm, CI/CD pipelines).
  • Experience building resilient, high-throughput, low-latency systems for AI inference.Strong background in observability and monitoring: Prometheus, Grafana, OpenTelemetry.
  • Familiarity with security, compliance, and governance concerns in AI infrastructure (data sovereignty, air-gapped deployments, audit logging).
  • Solid understanding of DevOps, cloud-native architectures, and Infrastructure as Code.
  • Exposure to multi-cloud and hybrid deployments (AWS, GCP, Azure, sovereign/private cloud).
  • Experience with benchmarking and cost/performance tuning for AI systems.
  • Background in MLOps or collaboration with ML teams on large-scale AI production systems.
Professional Skills
  • Proven ability to partner with senior client stakeholders (CTO, CIO, COO) and translate technical strategy into business outcomes.
  • Skilled at leading multi-disciplinary teams and building trust across diverse technical and business functions.
  • Strong communication skills, with the ability to explain complex AI infrastructure concepts to both technical and non-technical audiences.
  • Comfortable navigating uncertainty, making pragmatic decisions, and adapting quickly to evolving technologies.
  • Passionate about creating scalable, sustainable, and high-impact solutions that help transform industries with AI.
Other things to know Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

Sign up for our monthly careers newsletter #J-18808-Ljbffr
  • Lead Engineer

    4 days ago


    Singapur, Singapore HTX (Home Team Science & Technology Agency) Full time

    1 day ago Be among the first 25 applicants HTX is the world’s first Science and Technology agency for Public Safety and Security . As a statutory board of the Ministry of Home Affairs and integral to the Home Team, our shared mission is to amplify, augment and accelerate the Home Team’s advantage in securing Singapore as the safest place on planet...

  • Lead AI Engineer

    4 days ago


    Singapur, Singapore GovTech Singapore Full time

    Lead AI Engineer (AI Capability Development & Infrastructure), DXD (Digital Excellence & Products Division), Ministry of Education Join to apply for the Lead AI Engineer (AI Capability Development & Infrastructure), DXD (Digital Excellence & Products Division), Ministry of Education role at GovTech Singapore Lead AI Engineer (AI Capability Development &...


  • Singapur, Singapore Cyber Infrastructure (P) Limited Full time

    AI Engineer About Company Cyber Infrastructure is #1 Technology solution provider, committed to delivering outstanding client experiences and optimal career growth for its team members. Established on November 7, 2003, Cyber Infrastructure (CIS) is a CMMI Level 5 accredited organization. We have emerged as the Largest Technology services Provider company in...

  • Lead Software

    1 day ago


    Singapur, Singapore Tap Growth ai Full time

    Overview Lead Software & DevOps Engineer role at Tap Growth ai We are seeking a Senior Software Engineer – DevOps (PHP/Python) with strong expertise in software engineering, CI/CD, and cloud operations. This role is ideal for engineers who thrive in a hands-on environment, while also contributing to team collaboration and technical decision-making. You...

  • DevOps Engineer

    4 days ago


    Singapur, Singapore Manus AI Full time

    Direct message the job poster from Manus AI Manage and maintain container clusters and other open-source component clusters across various business lines Build and enhance infrastructure operation platforms, including infrastructure management, CI/CD, monitoring/alerting, and logging systems Respond quickly to incidents and implement effective solutions to...


  • Singapur, Singapore Airwallex Full time

    About Airwallex Airwallex is the only unified payments and financial platform for global businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 100,000 businesses worldwide – including Brex, Rippling, Navan, Qantas, SHEIN and many more – with fully integrated solutions to manage everything from business...

  • AI Product Manager

    4 days ago


    Singapur, Singapore Manus AI Full time

    Overview Direct message the job poster from Manus AI Conduct market research, user analysis, and competitive landscape analysis for the AI agent product to define a clear product positioning and growth strategy. Plan and manage the product roadmap, defining core objectives, feature priorities, and delivery timelines for each phase. Collaborate closely with...

  • User Research Lead

    1 day ago


    Singapur, Singapore PLAUD ai Full time

    About Plaud Inc. Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,000,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture,...


  • Singapur, Singapore Tundra Technical Solutions Full time

    Overview Talent Strategy Architect | APAC Direct Sourcing Leader | Transforming TA into a Strategic Business Lever for High-Growth & Tech-Driven Enterprises. Our client is building the next generation of immersive digital experiences and a new Generative AI core group. This is a greenfield opportunity to contribute from the ground up with access to massive...


  • Singapur, Singapore ByteDance Full time

    Responsibilities Our team is dedicated to building a highly available and scalable general-purpose Serverless platform that embodies the philosophy of Function-as-a-Service (FaaS). By enabling one-click function creation and deployment while abstracting infrastructure and operational complexities, we significantly reduce developers' burdens in both...