Platform & Reliability Engineer
1 week ago
**SLOs & error budgets**: - Define, track, and evangelize latency and availability targets for our payment APIs. - **Observability**: - Deploy Cloud Monitoring, Cloud Trace, Error Reporting, and dashboards; integrate alerts via Incident.io and Slack for on-call. - **Incident lifecycle**: - Establish blameless postmortems, guardrails, and runbooks to drive learning and prevent recurrence. - **CI/CD golden path**: - Codify Cloud Build pipelines and automated canary rollouts for Cloud Functions / Cloud Run. - **Infrastructure as Code**: - Manage GCP resources; embed security, IAM least-privilege, and cost controls by default. - **Performance & cost tuning**: - Profile hot paths (BigQuery, Firestore, Pub/Sub), and implement caching or concurrency improvements to keep user latency < 100 ms. - **Developer tooling**: - Eliminate toil by improving local-to-prod parity, secrets management, and spinning up environments with a single command. - **Culture carrier**: - Instill reliability thinking across engineering and product as the first platform-focused hire. **Requirements**: - At least 5+ years of experience building/operating production systems at scale, ideally on Google Cloud or a similar serverless stack, ideally in fast-paced or startup settings. - Hands‑on Fluency with Firebase, Cloud Build, Cloud Run/Functions, Pub/Sub, Cloud SQL/Spanner, VPC Service Controls. - Strong coding in Python or Go for automation, with an eye on maintainability. - Demonstrated record of driving observability, on‑call and cost optimisation in a fast‑moving environment. - Excellent collaboration and communication skills to work effectively with cross-functional teams. - Experience in payments, PCI‑DSS, or crypto settlement flows is a bonus. **_Tech note: _**_we are _**_99 % serverless _**_. There are no pet VMs to patch, but the stakes are higher: every cold‑start, DB connection pool and retry policy can impact real money transfers. You’ll architect for resiliency and velocity._
-
Amps Engineer
7 hours ago
Singapore Pfizer Full timeCompany Description Entrusted by Pfizer Singapore, Cielo Talent supports Pfizer to recruit permanent employees for the expansion of Pfizer Tuas manufacturing site in Singapore. **Why Pfizer** Pfizer careers are like no other. In our culture of individual ownership, we believe in our ability to improve future healthcare, and potential to transform millions...
-
Platform & Reliability Engineer
2 weeks ago
Singapore Breeze Full timeOverview Join to apply for the Platform & Reliability Engineer role at Breeze . Are you passionate about solving complex challenges in the fintech space? We’re looking for talented individuals to join our dynamic startup, backed by Sequoia Capital. We’re building the universal
-
Singapore Shopify Full timeCompany Description Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of...
-
Singapore Shopify Full timeCompany Description Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of...
-
Site Reliability Engineers/Platform Engineers
2 weeks ago
Singapore Razer Full time $120,000 - $180,000 per yearJoining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put...
-
Site Reliability Engineer, Traffic Platform
2 weeks ago
Singapore ByteDance Full timeSite Reliability Engineer, Traffic Platform About the Team Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed infrastructures. Our SREs are tasked to ensure the traffic services are reliable, fault‑tolerant, efficiently scalable and cost‑effective. You will have the opportunity...
-
Public Sector Platform Engineer
6 days ago
Singapore Xtremax Full timeA leading tech company in Singapore is seeking a Platform Operations Engineer to enhance IT infrastructure for government agencies. You will maintain on-premises platforms, ensure system reliability, and implement modern operational practices. Candidates with public sector experience are preferred. Responsibilities include managing critical infrastructure,...
-
Singapore Razer Inc. Full timeSite Reliability Engineers/Platform Engineers (Mid/Senior)Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work , offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the...
-
Site Reliability Engineer
2 weeks ago
Singapore ByteDance Full timeSite Reliability Engineer - Media Platform Responsibilities Build global infrastructure for multimedia transport, storage and processing, to serve billions of users all over the world. Engage in global production system management, such as monitoring, emergency response, capacity planning and optimization. Build tools, automations, visualizations and...
-
Site Reliability Engineer, Traffic Platform
3 days ago
Singapore ByteDance Full time $120,000 - $200,000 per yearLocation:SingaporeTeam:TechnologyEmployment Type:RegularJob Code:A111172AResponsibilitiesAbout the TeamSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed infrastructures. Our SREs are tasked to ensure the traffic services are reliable, fault-tolerant, efficiently scalable and...