
Senior System Reliability Engineer
2 days ago
This position involves working within a dynamic, global team that is dedicated to advanced reliability testing of cutting-edge products. The specialist will collaborate closely with cross-functional teams across various organizations on setup and testing of accelerator-product systems.
Main Responsibilities- Setup and Testing:
- Plan, execute, and optimize system-level setups for accelerator products, including server rack and system configurations.
- Ensure seamless integration and functionality of server systems with advanced cooling solutions and environmental management systems.
- Validate and maintain reliability test scripts for automated and manual testing processes.
- Reliability Assessment and Testing:
- Conduct comprehensive reliability assessments of accelerator systems, focusing on mechanical, thermal, and electrical stress factors.
- Design and implement environmental stress tests to simulate data center conditions, including operational stress, thermal cycling, signal, and power integrity.
- Evaluate material interactions and their impact on product reliability, ensuring robustness in diverse operating environments.
- Analyze results to identify potential reliability risks and areas for design improvement.
- Functional Testing and Fault Isolation:
- Perform detailed functional testing to evaluate system performance under various operational conditions.
- Identify, isolate, and troubleshoot faults using advanced diagnostic tools and methodologies.
- Failure Analysis and Reporting:
- Perform root cause analysis for identified reliability failures and develop corrective actions for design and process enhancement.
- Collaborate with cross-functional teams to conduct root cause analysis of reliability testing failures.
- Collaboration and Documentation:
- Work closely with design, manufacturing, and quality teams to align reliability goals with overall product requirements.
- Generate comprehensive reports detailing reliability test results, analysis, and recommendations.
- Maintain meticulous records of testing methodologies and outcomes for future reference and continuous improvement initiatives.
- Mentorship:
- Effectively mentor junior engineers, providing guidance in both technical domains and professional skill development to foster growth and team success.
- Key Requirements:
- Possess knowledge of reliability engineering principles, product lifecycle, and standards in high-performance computing environments.
- Demonstrate proven experience in system-level setup and testing for accelerator products or similar technologies.
- Show proficiency in developing and executing reliability test scripts and protocols.
- Familiarity with reliability standards and best practices in high-performance computing environments.
- Familiarity with data center environmental management, server rack/system configurations, and integrated cooling solutions.
- Strong understanding of environmental stress factors, including thermal, mechanical, and electrical stresses, in server systems (L6–L10).
- Expertise in failure analysis techniques, including root cause analysis and fault isolation methodologies.
- Excellent written and verbal communication skills for clear reporting and collaboration.
- Strong analytical, problem-solving, and communication skills.
- Experience with reliability testing tools, simulation software and statistical tools is an added advantage.
- Knowledge in project and risk management is an added advantage.
- Self-starter and able to independently drive tasks to completion.
- Ability to structure and execute complex analysis, draw insights, and communicate summary conclusions/recommendations to senior management and customers/partners.
- Ability to network, build relationships, and collaborate to drive effective decision-making across multiple functions and levels.
- Education:
- Bachelor's or Master's degree in Electrical/Electronics Engineering (EE) or a related field.
-
Reliability Engineer/ Senior Engineer
2 weeks ago
Singapore Systems on Silicon Manufacturing Co. Pte. Ltd. Full timePosition Detail - Reliability Engineer/ Senior Engineer- Posting Date : 03 Jul 2025 | Closing Date :01 Oct 2025_SSMC (Systems on Silicon Manufacturing Company Pte. Ltd.), is a Joint Venture between NXP and TSMC. We offer flexible and cost effective semiconductor fabrication solutions by maintaining fully equipped SMIF cleanroom environment, 100% equipment...
-
Reliable Systems Engineer
1 day ago
Singapore beBeeReliability Full time $100,000 - $140,000Job OverviewThe position of System Reliability Engineer Specialist is available in a dynamic global team focused on advanced reliability testing for cutting-edge products.Key Responsibilities:System-Level Setup and TestingDevelop and optimize system-level setups for accelerator products, including server rack and system configurations.Ensure seamless...
-
Systems Reliability Engineer
4 days ago
Singapore NodeFlair Full time**Job Summary**: **Salary** S$8,000 - S$10,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 4 years **Tech Stacks** Go Cloudflare CI Chef Puppet UNIX Linux Ansible SQL PostgreSQL MySQL Redis Python **About Us** We realize people do not fit into neat boxes. We are looking for curious and empathetic individuals who are...
-
Senior Principal Reliability Engineer
7 days ago
Singapore NXP Semiconductors Full timeSenior Principal Reliability Engineer page is loaded## Senior Principal Reliability Engineerlocations: Singaporetime type: Full timeposted on: Posted Todayjob requisition id: R- We are looking for Reliability Engineer role in preparation for the formation of the joint venture of NXP and VIS, known as VSMC.**Job Description**This posting is for a Senior...
-
Chief System Reliability Strategist
4 days ago
Singapore beBeeReliability Full time $80,000 - $120,000Job Description">We are seeking a seasoned Senior Site Reliability Engineer to join our technology infrastructure team in Singapore. This individual will play a key role in designing, building, and operating critical platforms, pipelines, and tooling that power large-scale global systems. Role Overview">You will work closely with cross-functional teams to...
-
Senior Site Reliability Engineer
1 week ago
Singapore AKAMAI TECHNOLOGIES APJ PTE. LTD. Full timeAs a Senior Site Reliability Engineer, you will influence a wide array of teams. You will be responsible for the performance and reliability of Akamai’s delivery products by working with the Product, Engineering and Support teams to diagnose, mitigate and solve outages. You will have to solve some of the most complex problems in distributed systems at...
-
Senior Manager Reliability Engineering
2 weeks ago
Singapore Advanced Micro Devices Full time**Senior Manager Reliability Engineering**: - Singapore, Singapore - Engineering - 66745 **Job Description**: **WHAT YOU DO AT AMD CHANGES EVERYTHING** - We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing...
-
Site Reliability Engineer
2 weeks ago
Singapore Hyphen Connect Full timeSite Reliability Engineer (Crypto Trading) Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect Site Reliability Engineer (Crypto Trading) 2 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (Crypto Trading) role at Hyphen Connect We are hiring for one of our ecosystem projects in...
-
System Reliability Engineer
1 day ago
Singapore beBeeInstrumentation Full time $80,000 - $120,000Job Title:Instrumentation SpecialistJob DescriptionThe Instrumentation Specialist is responsible for maintaining the instrumentation for all plant operations, including field installations and associated control devices. Key responsibilities include developing maintenance and reliability plans related to instrumentation systems, providing technical support...
-
Senior Site Reliability Engineer
2 weeks ago
Singapore Sea Limited Full timeEngineering and Technology - Infrastructure, Singapore - Experienced (Individual Contributor) Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Senior Site Reliability Operation Engineer, you are responsible for improving the availability and reliability of our...