Site Reliability Engineer, VP

1 month ago


Singapur, Singapore Blackstone Full time

Blackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $ trillion in assets under management include investment vehicles focused on private equity, real estate, public debt and equity, infrastructure, life sciences, growth equity, opportunistic, non-investment grade credit, real assets and secondary funds, all on a global basis. Further information is available at . Follow @blackstone on  ,  , and  .

 

Job Description:

Blackstone’s Site Reliability Engineering team is responsible for improving the reliability of systems and services across the firm. This is achieved

through the education and enablement of engineers on SRE practices and principles. You’ll have the opportunity to evaluate and select tools,

deploy and maintain observability systems and pipelines, mature the operations and support of services and platforms, and solve new problems

ands challenges as they arise.

 

This position involves the selection, implementation, and maintenance of key observability tooling. It requires ongoing evaluation of the firm’s

needs in observability, monitoring, alerting, resilience, and recovery. We collaborate with service owners on design, implementation, and

management of services for continuous improvements. We improve the reliability of services by continuously evaluating availability using clear

definitions and measurable targets. We plan for and practice recovery from disaster scenarios and respond in real time to incidents alongside

service owners. We guide the postmortem process for continuous improvement.

 

Key Responsibilities:

  • Enable and assist in the understanding and adoption of SRE methodologies across the firm

  • Setting standards and objectives to measure and improve the firm’s adoption of SRE principles over time

  • Partnering with colleagues in various roles and reporting lines to establish indicators and targets for service reliability

  • Collaborating to implement SLO based monitoring for many platforms and services

  • Leveraging software and systems engineering skill sets to achieve and maintain availability targets while enabling developer velocity

  • Implementing monitoring and alerting that reflects the reliability of services for users and enables effective on-call operations

  • Evaluating, selecting, and implementing strategic observability tools and working to minimize overhead in maintenance

  • Participate in on-call rotations and respond to system incidents to minimize downtime and ensure service availability

  • Using automation to manage, maintain, and scale SRE systems and to minimize individual operational toil

  • Fostering a blameless culture while driving postmortem discussions and reporting

 

Qualifications:

  • Ability to write automation scripts, as well as read and troubleshoot code (Python, Bash, C#, Javascript etc)

  • Proficiency with public cloud providers (strong AWS experience, preferred Azure experience)

  • Configuration-as-code, infrastructure management, and adjacent CI/CD tooling (Terraform, Puppet, Gitlab, Jenkins)

  • Hand-on experience with Docker and container schedulers including AWS ECS & EKS

  • Excellent troubleshooting skills for Linux, Windows, and Networking

  • Experience with observability tools (Grafana, Prometheus, Splunk, etc.)

  • Incident management, conducting postmortems

  • Excellent communication and organizational skills

  • Drive to improve systems and processes through a sense of shared ownership

 


The duties and responsibilities described here are not exhaustive and additional assignments, duties, or responsibilities may be required of this position.  Assignments, duties, and responsibilities may be changed at any time, with or without notice, by Blackstone in its sole discretion.




  • Singapur, Singapore Sea Full time

    Our Infrastructure team provides the end-to-end managed services and solutions for the Group's entire Internet infrastructure alongside running business applications. We excel in building the architecture, providing solutions and operations of data centre, connectivity, cloud, networking, system, storage and security. We are a proud provider of high-quality...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik?AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, excel in...


  • Singapur, Singapore GEMINI Full time

    Department : Platform Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Platform focuses around building a scalable and secure foundations platform, enabling Engineering to deploy, validate,...


  • Singapur, Singapore IHiS Full time

    Position OverviewThe Reliability Lead will support the reliability principal with senior management in strategy discussion for application & system improvement, and will also manage the reliability team. He/She will ensure that the existing site reliability engineering (SREs) initiatives, such as monitoring availability, uplifting capability and automoation...


  • Singapur, Singapore StarHub Full time

    Job Description We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on...


  • Singapur, Singapore Wibit Consulting & Services (WibitCS) Full time

    In Collaboration, we are building the backbone of reliable cloud solutions! Your Mission as a Site Reliability Engineer (SRE): Ensure the stability and performance of Yealink's overseas cloud operations. Tackle performance bottlenecks and implement creative solutions. ️ Master operational tasks like incident management, service requests, and system...


  • Singapur, Singapore Qlik Full time

    Description What makes us Qlik? AGartner Magic Quadrant Leader for 14years in a row, Qliktransforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster. We excel...


  • Singapur, Singapore Ripple Full time

    At Ripple, we’re building a world where value moves like information does today. It’s big, it’s bold, and we’re already doing it. Through our crypto solutions for financial institutions, businesses, governments and developers, we are improving the global financial system and creating greater economic fairness and opportunity for more people, in more...


  • Singapur, Singapore Shopee Full time

    Senior Site Reliability Engineer (Promotion) - Engineering Infra DepartmentEngineering and TechnologyLevelExperienced (Individual Contributor)LocationSingapore The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best...


  • Singapur, Singapore Vortexa Full time

    Vortexa was founded to solve the immense information gap that exists in the energy industry. By using massive amounts of new satellite data and pioneering work in artificial intelligence, Vortexa creates an unprecedented view on the global seaborne energy flows in real-time, bringing transparency and efficiency to the energy markets and society as a...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: Job Description - Senior Reliability Engineer (Electrical) / Electrical - Subject Matter Expert Electrical Reliability and Maintenance: -Provide technical subject matter expertise to enhance the electrical reliability and ensuring all KPIs are met. -Improve reliability of electrical equipment by implementing repair...


  • Singapur, Singapore TikTok Full time

    About the team Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs...


  • Singapur, Singapore Garena Full time

    Job Description Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance. Setup, manage and maintain product, middleware, big-data applications and services. Perform regular and ad-hoc server-side deployments, performance fine-tuning and...

  • Reliability Engineer

    4 months ago


    Singapur, Singapore Broadcom Inc. Full time

    Please Note : 1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account) 2. If you already have a Candidate Account, please Sign-In before you apply. Job Description: As part of the quality and reliability team you will be responsible for setting and managing...


  • Singapur, Singapore Sea Full time

    About Sea LabsSea Labs is at the core of the Sea platform development, supporting diverse business lines from e-commerce, supply chain, games, payment, and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, passionate engineers have charted...


  • Singapur, Singapore Salve.Inno Consulting Full time

    In this role, you will ensure the stability and performance of the platform, proactively address incidents, and continuously improve operational efficiency. You’ll work closely with a dynamic team and report to the Operations and Development Supervisor in China.Key Responsibilities Oversee overseas cloud operations, maintaining high platform stability and...


  • Singapur, Singapore Helius Full time

    ■ Job Scope Code implementation of the existing service infrastructure (IaC) Operation and performance improvement of applications and middleware Network construction and operation on AWS or GCP Development and operation of tools for automation of operations such as CI/CD Construction and operation of monitoring environment for fault detection and...


  • Singapur, Singapore Celanese Corporation Full time

    Responsibilities 职责: • Responsible for improving the Reliability of Static Equipment by developing and implementing effective Reliability Strategies.• Provide technical subject matter expertise for Static Equipment based on Engineering Codes, SEPs and RAGAGEPS.• Identify and eliminate recurring problems and bad actors by analyzing failure...


  • Singapur, Singapore United Overseas Bank Full time

    AVP Site Reliability Engineer, Group Infrastructure Platform Services Posting Date: 21-May-2023 Location: Singapore, Singapore Company: United Overseas Bank Ltd About UOB United Overseas Bank Limited (UOB) is a leading bank in Asia with a global network of more than 500 branches and offices in 19 countries and territories in Asia Pacific,...


  • Singapur, Singapore Sea Full time

    About Sea LabsSea Labs is at the core of the Sea platforms development, supporting diverse business lines from e-commerce, supply chain, games, payment and finance, among many others. The strong growth and unique positioning of Sea's e-commerce business, Shopee, spurred the launch of Sea Labs Indonesia. Since its inception, the group of passionate engineers...