VP of Site Reliability Engineering

7 days ago


Singapore DBS Bank Limited Full time
Business Function

Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our multiple banking delivery channels.

Job Objective

DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform's efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team's productivity.

Responsibilities
  • Develop monitoring and onboarding guidelines for various applications using observability platform stack, ensuring accurate monitoring and data collection.
  • Drive Observability standards, best practices, operations and processes for the Enterprise in AppDynamics & other observability tools
  • Automate routine tasks and reporting processes using APIs and scripting, reducing manual effort and improving efficiency in AppDynamics & other observability tools
  • Identify and resolve performance issues through detailed analysis of transaction traces, application logs, and system metrics.
  • Collaborate with stakeholders to define performance metrics and monitoring requirements aligned with business goals.
  • Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
  • Design and implement monitoring solutions to track application performance, identifying bottlenecks and optimising system efficiency.
  • Conduct performance tuning and capacity planning to ensure applications meet scalability and reliability requirements.
  • Develop custom dashboards and reports to provide actionable insights and drive decision-making processes.
  • Collaborate with development and operations teams to integrate Observability platform stack with CI/CD pipelines and other DevOps tools.
  • Configure and fine-tune alerts to proactively detect and address performance issues before they impact end-users.
  • Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
  • Work with application teams to develop long-term monitoring strategies that align with business goals and technology roadmaps.
  • Create data retention polices and access controls (RBAC) to manage user permissions.
  • Perform application maintenance, patching, upgrading controller versions, agents etc and ensure EOS/EOL is maintained.
Deliverables
  • Ensure on-time delivery of tasks and projects.
  • Ensure continuous uptime of applications and services.
  • Ensure no security or audit issues.
Job Dimensions
  • Comply to bank standards to track and follow up on the assigned projects.
  • Cover all areas in application and infrastructure operations of the platform.
Requirements
  • You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
  • Strong communication skills and ability to explain protocol and processes with team and management
  • A passion for learning and using new technologies in the open-source communities.
  • A passion for coding.
  • Min 10 years of IT work experience.
  • Working knowledge in AppDynamics, ELK Stack, Grafana, Open Telemetry (OTEL)
  • In-depth experience in Unix/Linux/Shell/Python scripting with quality, scalability, and extensibility.
  • Experience in triaging and troubleshooting application problems quickly in monitoring tools by using various techniques - Transaction snapshots, Diagnostic Sessions, Data Collectors
  • Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
  • Knowledge in Confluent Kafka, Prometheus & other APM tools (Dynatrace, Datadog, New Relic, Splunk) is a plus.
  • Knowledge in AI/ML capabilities to automate RCA's and shorter MTTR when issues arise.
  • Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of and DNS
  • Ability to contribute to discussions on design and strategy.
  • Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development.
  • Good problem diagnosis and creative problem-solving skills
  • Experience in NodeJS, Spring boot could be a plus.
Apply Now

We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.

  • Singapore Sea Limited Full time

    Engineering and Technology - Infrastructure, Singapore - Entry Level Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Site Reliability Engineer, you are responsible for improving the availability and reliability of our Infrastructure services. - Responsible for...


  • Singapore Retentia technology private limited Full time

    **3+ years of experience in Site Reliability Engineering, DevOps**, or a related field. - **Strong knowledge of cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).** - Experience with automation and configuration management tools (e.g., T**erraform, Ansible, Chef, or Puppet).** - Proficiency in at least **one programming...


  • Singapore The Edge Asia Full time

    Our client is a US hedge fund and their Technology group is constantly improving the company’s IT infrastructure, positioning them at the forefront of a rapidly evolving technology landscape. They are a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. Their...


  • Singapore Gravitas Recruitment Group Full time

    Job details - Location - Singapore - Salary - S$9000 - S$13000 per month - Job Type - Permanent - Ref - BBBH137137_1690786002 - Posted - about 1 hour ago Job summary **Our client, a trading firm, is looking for a Site Reliability Engineer to join their team. They are seeking team players who demonstrate a creative approach to problem-solving and take...


  • Singapore IFUN GAMES Full time

    **Responsibilities** - Design, implement, and maintain tools and processes for monitoring, alerting, and incident response - Collaborate with developers to improve the design and operation of systems, with a focus on reliability, performance, and scalability - Participate in on-call rotations to respond to incidents and handle escalations - Analyze system...


  • Central Singapore Emprego SG Full time

    **Location** Singapore, Central Singapore **Job Type** Permanent **Salary** 9,000 - 15,000 Per **Date Posted** 5 hours ago Additional Details **Job ID** 16908 **Job Views** 1 Roles & Responsibilities **Objectives of this Role** - Run the production environment by monitoring availability and taking a holistic view of system health Improve...


  • Singapore Sea Limited Full time

    Engineering and Technology - Infrastructure, Singapore - Experienced (Individual Contributor) Our DevOps Engineering team plays an important role in developing and maintaining the internal systems and tools for the Infrastructure team. As a Senior Site Reliability Operation Engineer, you are responsible for improving the availability and reliability of our...


  • Singapore J P INFOTEC PTE. LTD. Full time

    **Site Reliability Engineer** **Responsibilities** - Support and/or own the deployment of global products including setting up production and internal environments - Provide 24/7 first line of Engineering support (via follow the sun teams in all regions) for any issues related to global product deployment, availability and internal operations support. -...


  • Singapore Experis Full time

    **Site Reliability Engineer**: - Location- Singapore- Job reference- BBBH133368_1699927914- Salary- S$6000 - S$7500 per month- Consultant name - Rajasekar Shirley Monisha Consultant contact no. - 6232 5244 - EA License No. - 02C3423 - Consultant Registration No. - R22106767 **Responsibilities**: - Responsible for deployment, change, issues triage and...


  • Singapore COMBUILDER PTE LTD Full time

    Roles & ResponsibilitiesWe are seeking talented and driven professionals to join our Site Reliability Engineering (SRE) team. This role involves helping organizations enhance the availability, performance, and resilience of their applications and services through the deployment and administration of Observability Platforms.Key ResponsibilitiesDeploy and...


  • Singapore AKAMAI TECHNOLOGIES APJ PTE. LTD. Full time

    **Join our Site Reliability team**: **Help us shape the future of the Internet**: As a Senior Site Reliability Engineer, you will be responsible for: - Deploying, managing, and operating scalable, highly available, and fault-tolerant systems on the Akamai Zero Trust Cloud Platform - Analysing and improving security, stability, speed, and capacity of Akamai...


  • Singapore Ellwood Consulting Full time

    **Contract type**: Permanent **Location**: Singapore **Salary**: SGD8,000 - SGD12,000 per month **Contact name**: Roy Mok Zi An **Published**: about 3 hours ago Job description About our client For more than 60 years, my client has been a leader in the interactive and game industry. The company also creates, manufactures, and sells coin-operated...


  • Singapore FUNFLY PTE. LTD. Full time

    Roles & ResponsibilitiesPosition OverviewAs a site reliability engineer, you will be responsible for ensuring the smooth operation of game services by maintaining, monitoring, and responding to faults daily. They will develop automation tools to enhance operational efficiency and manage game servers for optimal performance. The role includes collaborating...


  • Singapore GK CONSULTING PTE. LTD. Full time

    Roles & ResponsibilitiesWe're seeking an experienced Senior Site Reliability Engineer to ensure the reliability, availability, and performance of our cloud-based internet services.Key Responsibilities1. Own reliability, availability, and user experience for assigned cloud services2. Develop and implement service governance initiatives to increase reliability...


  • Singapore Hays Full time

    **Your new company** *** One of the famous Internet and video game company in the world, they are expanding in global market and increasing headcount in Singapore as a regional hub. They are currently providing over hundreds PC and mobile games across a world-wide range of genres in over 200 countries. **Your new role** *** As a Site Reliability Engineer,...


  • Singapore NomiSo Full time

    **Lead Site Reliability Engineer** **Pay**:10,000-12,000 SGD/Month **About NomiSo**: NomiSo is a product and services engineering company. We are a team of Software Engineers, Architects, Managers, and Cloud Experts with expertise in Technology and Delivery Management. Our mission is to Empower and Enhance the lives of our customers through simple...


  • Singapore DT One Full time

    **About DT One** **Key Responsibilities** - Run the production environment by monitoring availability and taking a holistic view of system health - Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve - Improve reliability, quality, security, and...


  • Singapore Quess Corp Limited Full time

    **Job Information**: Industry **Insurance** *** Salary **6000 - 8000** *** Work Experience **2-4 Years** *** City **singapore** *** State/Province **singapore** *** Country **Singapore** *** Zip/Postal Code **189557** *** - Income IT is adopting site reliability engineering (SRE) principles to implement continuous operation support to business...


  • Singapore Visier Solutions Inc Full time

    **Visier is the leader in people analytics and we believe in a 'people-first' approach to business strategy. Our innovative technology transforms the way that organisations make decisions, allowing them to elevate their employees and drive better business outcomes. Embarking on an exciting new chapter in our growth story, we are looking for talented...


  • Singapore NodeFlair Full time

    **Job Summary**: **Salary** S$7,000 - S$9,000 / Monthly **Job Type** **Seniority** Mid **Years of Experience** At least 4 years **Tech Stacks** Analytics Spring Shell OOP Logstash Chef Puppet UNIX Kibana Grafana Linux kafka Springboot Ansible Node.js Elasticsearch Python **NTT DATA Singapore PTE Ltd is a wholly owned subsidiary of NTT DATA Corp, a part...