Senior Site Reliability Engineer
5 days ago
Job Category
Software Engineering
Job Details
**About Salesforce**
Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.
Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place Agentforce is the future of AI, and you are the future of Salesforce.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Salesforce services have reliability, capacity, performance and the availability to deliver our customer’s needs and a rate of improvement that our customers expect.
Our software development focuses on enabling service owners to operate their services safely at scale, whether through paved path integrations onto observability frameworks, optimizing existing systems, designing infrastructure or eliminating work through AI/ML. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Salesforce, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. Experience with AI/ML systems, autonomous agents, or observability for intelligent platforms is a strong plus.
SRE’s culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
**Required Skills**:
- 5+ years of experience in Python, Go, or Java for automation, tooling, and integration.
- Hands-on experience designing, building and operating large scale distributed systems, identifying shortcomings and optimization opportunities
- Engineering Resiliency and Reliability: Design and develop systems,tools, and platforms that strengthen the resiliency and reliability of distributed services
- Strong experience with AWS or GCP and services like EC2, VPC, IAM, S3, EKS.
- Expertise in Kubernetes and modern container orchestration.
- Deep understanding of SRE principles: SLIs/SLOs, availability, resiliency, and incident metrics (TTD, TTR).
- Experience with AI/ML platforms, agents, or intelligent observability systems.
- Familiarity with observability tooling: Grafana, OpenTelemetry, Zipkin/Jaeger, and TSDBs.
- Hands-on with CI/CD pipelines and Git-based workflows.
- Experience with IaC and config management tools: Terraform, Helm, Ansible, or Puppet.
- Strong Linux systems knowledge and troubleshooting skills.
- Data-driven mindset for identifying systemic issues and improving service reliability.
**Responsibilities**
- Design, build, and maintain scalable backend systems and cloud-based services.
- Write clean, testable, and efficient code following engineering best practices.
- Develop automation and tooling to reduce manual effort and improve system reliability.
- Enhance observability through monitoring, logging, and distributed tracing.
- Support integration of AI-driven automation and observability platforms.
- Work closely with product and infrastructure teams to ship features and improvements iteratively in Agile teams.
- Define and implement SLIs/SLOs with engineering teams, driving reliability into system architecture.
- Build automation and self-healing capabilities to reduce manual operations.
- Operate and scale monitoring, alerting, and tracing systems for proactive issue detection.
- Lead post incident analysis, conduct postmortems, and ensure effective root cause resolution.
- Improve CI/CD practices to accelerate safe, frequent deployments.
- Use data to uncover trends, inform prioritization, and drive platform improvements.
- Collaborate on integrating AI-driven automation and observability to enhance reliability.
- Support and scale multi-cloud, multi-region services.
**Desired Skills**
- Knowledge of microservices, service mesh, or zero-trust infrastructure.
- Experience operating in global, multi-tenant, or compliance-sensitive environments.
- Strong written and verbal communication, with emphasis on documentation and knowledge sharing.
Unleash Your Potential
Accommodations
Posting Statement
Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for
-
Senior Site Reliability
5 days ago
Singapore Canonical Full timeSenior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating...
-
Singapore Airwallex Full timeSenior Site Reliability Engineer, Spend Foundations Join to apply for the Senior Site Reliability Engineer, Spend Foundations role at Airwallex Senior Site Reliability Engineer, Spend Foundations Join to apply for the Senior Site Reliability Engineer, Spend Foundations role at Airwallex Get AI-powered advice on this job and more exclusive features. About...
-
Site Reliability Engineer
2 weeks ago
Singapore EC1 Partners Full timeOverview EC1 Partners is working with a leading global eFX trading platform that is expanding its technology presence in Singapore. We are seeking an experienced Site Reliability Engineer (SRE) to join their team. This is a full-time, permanent role offering the opportunity to work in a fast-paced environment where scale, performance, and reliability are...
-
DevOps /Site Reliability Engineer
3 days ago
Singapore Qube Research & Technologies Full timeJoin to apply for the DevOps /Site Reliability Engineer role at Qube Research & Technologies Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research,...
-
Site Reliability Engineer
2 weeks ago
Singapore Crystal Equation Corporation Full timeWe are seeking a skilled Site Reliability Engineer (SRE) to join our team. SRE will be responsible for keeping all internal user-facing applications and other production systems running smoothly. This hybrid role involves a combination of both development and operations skills to build and manage systems that are both efficient and reliable. The Enterprise...
-
Senior Site Reliability Engineer
2 weeks ago
Singapore Oxford Knight Full timeSenior Site Reliability Engineer - Singapore or Hong Kong **Salary**: up to 250-275k SGD base **Summary** High-frequency prop trading firm with offices worldwide looking for skilled Senior Site Reliability Engineer developer to support and maintain their Linux trading infrastructure on a day-to-day basis. This is a pivotal role where you will lead...
-
Site Reliability Engineer
5 days ago
Singapore TRUEWATCH TECHNOLOGY INC PTE. LTD. Full time**Responsibility**: - Run production environment by monitoring availability and taking a holistic view of the system health. - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost. - Manage risks and resolves issues that affect the release scope, schedule and quality. - Suggest architecture improvements, push for...
-
Site Reliability
1 week ago
Singapore Canonical Full timeJoin to apply for the Site Reliability / Gitops Engineer role at Canonical 1 day ago Be among the first 25 applicants Join to apply for the Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely...
-
Senior Site Reliability Engineer
5 days ago
Singapore Canonical Full timeOverview Join to apply for the Senior Site Reliability Engineer role at Canonical . Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform is Ubuntu, widely used in enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. We have 1200+...
-
Singapore Shopify Full timeCompany Description Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of...