Site Reliability Engineer

7 days ago


Singapore Ubisoft Full time

Company Description**
CREATOR OF WORLDS**

Ubisoft’s 20,000 team members, working across more than 40 locations around the world, are bound by a common mission to enrich players’ lives with original and memorable gaming experiences. Their dedication and talent has brought to life many acclaimed franchises such as Assassin’s Creed, Far Cry, Watch Dogs, Just Dance, Rainbow Six, and many more to come. Ubisoft is an equal opportunity employer that believes diverse backgrounds and perspectives are key to creating worlds where both players and teams can thrive and express themselves. If you are excited about solving game-changing challenges, cutting edge technologies and pushing the boundaries of entertainment, we invite you to join our journey and help us create the unknown.

Since opening its doors in 2008, Ubisoft Singapore has become the biggest AAA game development studio in Southeast Asia. The 500-strong studio is home to 35+ different nationalities focused on delivering ambitious gaming experiences to our players. Ubisoft Singapore has been contributing to all the Assassin’s Creed® titles since Assassin’s Creed® II. It innovated within the franchise as the studio behind the naval battle gameplay and water technology in Assassin’s Creed® III, Assassin’s Creed® IV Black Flag® and most recently in Assassin’s Creed® Valhalla. Its expertise in AAA and live operations, combined with a passion for naval gameplay, pushed the team to lead the development of Skull and Bones revealed at E3 in 2017.

Job Description**
YOUR DAILY ADVENTURE**

The Site Reliability Engineer (SRE) is responsible of Ops and development tasks such as level 4 support and the implementation of highly scalable Game infrastructure. The SRE is working as the Infra services integrator that enables the production to build Games using principals of cloud-Native, DevOps and continuous Delivery. The SRE has a good development background with knowledge of infrastructure and automation.
**WHAT YOU WILL DO**
- Designing and/or implementing a highly scalable Cloud and Bare Metal server and network infrastructure
- Share responsibility and ownership of game functions and services with developers who create them
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Practice sustainable incident response and blameless postmortems.
- Ability to debug and optimize code and automate routine tasks (“toil”)
- Consulting on the game's software and data architecture to ensure maximum infrastructure scalability
- Ensuring reliability and consistency of game data
- Work with developers to develop adequate monitoring and monitor system events to ensure health, maximum system availability and service quality
- Assist in evaluating new requirements, technical design and standards
- Reduce the cost of failure for changes
- Define prescriptive ways to measure reliability

Qualifications**
Education**:
A baccalaureate degree or equivalent experience in Computer Information Systems, Computer Science, Mathematics or a related field.
**Relevant experience**:
2+ years of experience with software development or 5+ years of automation focused system administration with Hybrid hosting solutions.
- Experience in one or more of the following is a plus: C, C++, C#, Java, Python, Go or Ruby.
**WHAT YOU BRING**
**Skills**:

- Self-driven, be slightly paranoid about system stability
- Be able to teach fundamental principles to other engineers/experts.
- Skill in developing techniques and methodologies to resolve unprecedented problems or situations
- Ability to make complex information accessible to non-technical people
**Knowledge**:

- In-depth knowledge of Linux system internals and operating system design
- In-depth understanding of Public Cloud providers (GCP, AWS) and Openstack platform
- In-depth knowledge on CI/CD, Gitlab, Change management
- In-depth knowledge on Infrastructure orchestration with Terraform
- Proficient knowledge in orchestration systems such as Kubernetes
- Proficient knowledge in Configuration Management tools such as Saltstack, Chef, Puppet & Ansible
- Proficient knowledge in Dashboards (Grafana), Alerting and Monitoring system
- Proficient knowledge in Promotheus
- Proficient knowledge in VictoriaMetrics
- Proficient knowledge in relational database systems like MySQL
- Proficient knowledge in document storage systems like MongoDB
- Proficient knowledge in Redis/PostGreSQL

Additional Information**
WHAT YOU’LL ENJOY**

**JO



  • Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time

    **Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...


  • Singapore ETEAM WORKFORCE PTE. LTD. Full time

    Position: Site Reliability Engineer (SRE) Work Mode - Onsite/Hybrid Timing - 9am to 6 pm Duration – 1 Year (Highly extendable) Salary: 6018 SGD Work Location: Robinson Road, Singapore About the Role We are looking for a seasoned Site Reliability Engineer (SRE) with 5+ years of experience to join our Platform Engineering team. This role is ideal for someone...


  • Singapore JJ Consulting Services Full time

    Our Client is a fast growing company in Singapore, who is seeking to recruit a Site Reliability Engineer. **Site Reliability Engineer** **Key Roles & Responsibilities** - Providing ancillary support of Enterprise-Grade Products and solutions at customer's sites - Ironing out deployment issues or challenges that our customers may face - Responsible for...


  • Singapore Qlik Full time

    **What makes us Qlik?** A Gartner® Magic Quadrant Leader for 14 years in a row, Qlik transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel in...


  • Singapore Adyen Full time

    **This is Adyen** Adyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition. For our teams, we create an environment with opportunities for our people to succeed, backed by the...


  • Singapore Crystal Equation Corporation Full time

    We are seeking a skilled Site Reliability Engineer (SRE) to join our team. SRE will be responsible for keeping all internal user-facing applications and other production systems running smoothly. This hybrid role involves a combination of both development and operations skills to build and manage systems that are both efficient and reliable. The Enterprise...


  • Singapore Point72 Full time

    Join to apply for the Site Reliability Engineer role at Point72 About the role As part of Point72’s Technology Team, you will focus on developing and maintaining complex, distributed, real-time systems that support our Global Macro business. Your responsibilities will include optimizing operations through automation, building foundational SRE components,...


  • Singapore APPLE SOUTH ASIA PTE. LTD. Full time

    Summary At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there’s no telling what you could accomplish. The people here at Apple don’t just build products - they craft the kind of wonder that’s revolutionized entire industries. It’s the...


  • Singapore DT One Full time

    About DT One DT One was founded to provide mobile carriers with the infrastructure and services they need to help migrant workers stay in touch with their family and friends back home. Today we operate a leading global network for mobile top‑up solutions, innovative mobile rewards, and Phone‑to‑Phone solutions. Our global network delivers better...


  • Singapore Second Talent Full time

    Infrastructure Platform Development Design, build, and enhance infrastructure operation platforms Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging Drive platform standardization and automation initiatives High Availability & Reliability Ensure maximum uptime for production services...