System Engineer
6 days ago
Job Description:
Situated in the heart of Singapore's Central Business District, Rakuten Asia Pte. Ltd. is Rakuten's Asia Regional headquarters. Established in August 2012 as part of Rakuten's global expansion strategy, Rakuten Asia comprises various businesses that provide essential value-added services to Rakuten's global ecosystem. Through advertisement product development, product strategy, and data management, among others, Rakuten Asia is strengthening Rakuten Group's core competencies to take the lead in an increasingly digitalized world.
AI & Data Division (AIDD) spearheads data science & AI initiatives by leveraging data from Rakuten Group. We build a platform for large-scale field experimentations using cutting-edge technologies to provide critical insights that enable faster and better and faster contribution for our business. Our division boasts an international culture created by talented employees from around the world. Following the strategic vision “Rakuten as a data-driven membership company”, AIDD is expanding its data & AI related activities across multiple Rakuten Group companies.
As a System Engineer (GPU Infrastructure & Platform Engineering), you will build, scale, and optimize the GPU cluster infrastructure that supports both training (e.g., ranking models, LLMs) and inference workloads. Your focus will be on the design and build of GPU platform with sophisticated scheduling, elasticity, quota management —ensuring efficient utilization, scalability, and stability for Rakuten’s AI workloads.
Key Responsibilities- Optimize Kubernetes (K8s) for GPU workloads, including scheduling policies, autoscaling, and multi-tenant resource isolation.- Deploy and maintain inference serving platforms (e.g., NVIDIA Triton, vLLM, SGlang) for high-throughput and low-latency model deployment.- Automate cluster provisioning, monitoring, and recovery to maximize uptime and GPU utilization.- Collaborate with ML engineers to troubleshoot GPU-related issues in training jobs (e.g., NCCL errors, OOM) and inference bottlenecks.- Implement observability tools (Prometheus, Grafana) to track GPU utilization, job performance, and cluster health.- Develop infrastructure-as-code (IaC) solutions for reproducible GPU environments (e.g., Terraform, Ansible).
Mandatory Qualifications- 3+ years of experience in DevOps/MLOps, GPU infrastructure, or distributed computing.- Deep expertise in Kubernetes (K8s) for GPU workload orchestration (e.g., KubeFlow, Volcano, custom schedulers).- Strong programming skills in Go or Python for platform development, automation and tooling.- Proficiency in Linux system administration, performance tuning, and networking (e.g., RDMA, InfiniBand).- Experience with IaC tools (Terraform, Ansible) and CI/CD pipelines (GitHub Actions, Jenkins).- Bachelor’s or higher degree in Computer Science, Engineering, or a related field.- Strong teamwork and communication skills, with a passion for solving infrastructure challenges.
Nice-to-Have Skills- Familiarity with distributed training frameworks (e.g., PyTorch DDP, FSDP, DeepSpeed).- Familiarity with Nvidia Triton serving framework or similar framework, and serving parameter tuning to make a good trade off between latency and throughput.- Hands-on experience with GPU clusters, including troubleshooting NVIDIA drivers, CUDA, and NCCL issues.- Knowledge of high-performance storage (Lustre, WekaFS) for large-scale training data.- Experience with LLM training/inference stacks (e.g., Megatron-LM, TensorRT-LLM).
Why Join Us?- Build and scale cutting-edge GPU infrastructure for ranking models, LLMs, and real-time AI.- Work with global AI/ML teams to solve high-impact infrastructure challenges.- Opportunity to shape the future of Rakuten’s GPU platform for scalability and efficiency.
-
Information System Engineer
1 week ago
Singapore HK SYSTEM MAINTENANCE PTE. LTD. Full timeInformation System Engineer Certificate/Nitec/Diploma in Electronics & Electrical Engineering or related. Able to design and troubleshoot CCTV system, Access Control system, Intercom system, simple Network system and Visitor Management system. Experience in coordinating with customers/end users, vendors and contractors. Able to work independently and...
-
System Engineer
1 week ago
Singapore PTC SYSTEM (S) PTE LTD Full time**Duties and Responsibilities**: - Work with customer to undertake the design, installation, configuration, maintenance of network system solutions which include upgrades, migrations across heterogeneous network or systems - Work in fast-paced environment with tight schedule with dynamic project teams which include planning, designing, scoping and creating...
-
Sr. System Engineer
4 days ago
Singapore SYSNET SYSTEM AND SOLUTIONS PTE. LTD. Full timeRoles & Responsibilities We are seeking a highly skilled Senior IT System Engineer to manage, optimize, and secure our IT infrastructure. This role involves hands-on administration of servers, networks, cloud services, virtualization, backup solutions, and end-user systems. You will also lead initiatives in business continuity, disaster recovery, and IT...
-
System Engineers
2 weeks ago
Singapore ACOUSTIC & LIGHTING SYSTEM PTE. LTD. Full time1. Audio Visual and Entertainment Lighting background 2. Strong understanding on system programming, system troubleshooting 3. Basic autocab knowledge 4. Strong communication skill with oversea manufacturer 5. Able to travel frequently for oversea projects. 6. Able to work on weekend/ overtime occasionally
-
Systems and Support Engineer
1 week ago
Singapore TAK SYSTEM INTEGRATION PTE. LTD. Full time**Job Description**: Tak System is looking for individuals who aren’t afraid to push beyond the norms of traditional IT. Innovative and creative in integrating traditional on-prem systems with cloud topology to provide customers with comprehensive solutions to enhance reliability and guard against emerging threats. **Requirement**: - Preferably Junior...
-
Technical Officer
1 week ago
Singapore ATT System Full time**About the Role** We are seeking a hands-on and reliable Technical Officer to join our team, supporting the maintenance and operation of traffic and security communication systems. You will play a key role in ensuring smooth system performance through both corrective and preventive maintenance, while also assisting in project execution and on-site...
-
System Support Assistant
6 days ago
Singapore OCI SYSTEM PTE. LTD. Full time**Job Scope for System Support Assistant** System Support Assistant for accounting system is essential for delivering excellent customer service, addressing initial technical inquiries, and setting the foundation for positive customer relationships. **Handling Pre-Sales Technical Support**: Assist potential customers by providing technical information about...
-
System Engineer
4 days ago
Singapore OMNI-PLUS SYSTEM LIMITED Full timeAdminister and develop Domain Controller ADDS (Active Directory Domain Services), Terminal Server to achieve centralization control. - Install and manage Virtualization Technology (Hyper-V) - Secure and structure network resources with shared folder permissions, patching, managing & housekeeping storage using Synology NAS and Windows Server - Manage domain...
-
Engineer ()
1 day ago
Singapore Digital System Projects Full time $60,000 - $120,000 per yearYou will work with the project manager to lead team members towards successful and timely completion of projects on various digital systems, including in automation, CCTV, card access, trunk radio and UPS. In addition to managing project milestones and deliverables, you work with respective stakeholders of these critical systems to ensure that project...
-
System Engineer
6 days ago
Singapore PTC SYSTEM (S) PTE LTD Full time**Duties and Responsibilities**: - Work with customer to undertake the design, installation, configuration, maintenance of network system solutions which include upgrades, migrations across heterogeneous network or systems - Work in fast-paced environment with tight schedule with dynamic project teams which include planning, designing, scoping and creating...