VP of Site Reliability Engineering

3 days ago


Singapore DBS Bank Limited Full time
Business Function

Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our multiple banking delivery channels.

Job Objective

DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform's efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team's productivity.

Roles and Responsibilities
  1. Develop monitoring and onboarding guidelines for various applications using observability platform stack, ensuring accurate monitoring and data collection.
  2. Drive Observability standards, best practices, operations and processes for the Enterprise in AppDynamics & other observability tools
  3. Automate routine tasks and reporting processes using APIs and scripting, reducing manual effort and improving efficiency in AppDynamics & other observability tools
  4. Identify and resolve performance issues through detailed analysis of transaction traces, application logs, and system metrics.
  5. Collaborate with stakeholders to define performance metrics and monitoring requirements aligned with business goals.
  6. Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
  7. Design and implement monitoring solutions to track application performance, identifying bottlenecks and optimising system efficiency.
  8. Conduct performance tuning and capacity planning to ensure applications meet scalability and reliability requirements.
  9. Develop custom dashboards and reports to provide actionable insights and drive decision-making processes.
  10. Collaborate with development and operations teams to integrate Observability platform stack with CI/CD pipelines and other DevOps tools.
  11. Configure and fine-tune alerts to proactively detect and address performance issues before they impact end-users.
  12. Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
  13. Work with application teams to develop long-term monitoring strategies that align with business goals and technology roadmaps.
  14. Create data retention polices and access controls (RBAC) to manage user permissions.
  15. Perform application maintenance, patching, upgrading controller versions, agents etc and ensure EOS/EOL is maintained.
Deliverables
  1. Ensure on-time delivery of tasks and projects.
  2. Ensure continuous uptime of applications and services.
  3. Ensure no security or audit issues.
Job Dimensions
  1. Comply to bank standards to track and follow up on the assigned projects.
  2. Cover all areas in application and infrastructure operations of the platform.
Requirements
  1. You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
  2. Strong communication skills and ability to explain protocol and processes with team and management
  3. A passion for learning and using new technologies in the open-source communities.
  4. A passion for coding.
  5. Min 10 years of IT work experience.
  6. Working knowledge in AppDynamics, ELK Stack, Grafana, Open Telemetry (OTEL)
  7. In-depth experience in Unix/Linux/Shell/Python scripting with quality, scalability, and extensibility.
  8. Experience in triaging and troubleshooting application problems quickly in monitoring tools by using various techniques - Transaction snapshots, Diagnostic Sessions, Data Collectors
  9. Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
  10. Knowledge in Confluent Kafka, Prometheus & other APM tools (Dynatrace, Datadog, New Relic, Splunk) is a plus.
  11. Knowledge in AI/ML capabilities to automate RCA's and shorter MTTR when issues arise.
  12. Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of and DNS
  13. Ability to contribute to discussions on design and strategy.
  14. Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development.
  15. Good problem diagnosis and creative problem-solving skills
  16. Experience in NodeJS, Spring boot could be a plus.
Apply Now

We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.

  • Singapore OCBC Full time

    Job Description:We are seeking a Site Reliability Engineer Leader to join our team at OCBC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our infrastructure. This role requires strong expertise in automating releases, continuous integration/delivery systems, and relevant infrastructure...


  • Singapore COMBUILDER PTE LTD Full time

    Roles & ResponsibilitiesWe are seeking talented and driven professionals to join our Site Reliability Engineering (SRE) team. This role involves helping organizations enhance the availability, performance, and resilience of their applications and services through the deployment and administration of Observability Platforms.Key ResponsibilitiesDeploy and...


  • Singapore FUNFLY PTE. LTD. Full time

    Roles & ResponsibilitiesPosition OverviewAs a site reliability engineer, you will be responsible for ensuring the smooth operation of game services by maintaining, monitoring, and responding to faults daily. They will develop automation tools to enhance operational efficiency and manage game servers for optimal performance. The role includes collaborating...


  • Singapore GK CONSULTING PTE. LTD. Full time

    Roles & ResponsibilitiesWe're seeking an experienced Senior Site Reliability Engineer to ensure the reliability, availability, and performance of our cloud-based internet services.Key Responsibilities1. Own reliability, availability, and user experience for assigned cloud services2. Develop and implement service governance initiatives to increase reliability...


  • Singapore TRINITY CONSULTING SERVICES PTE. LTD. Full time

    Roles & Responsibilities· Must have minimum 5 years' experience.· Strong technical knowledge and experience in supporting enterprise-level applications.· Proficiency in troubleshooting application issues, performing log analysis, and using monitoring tools.· Experience with databases and SQL query language.· Familiarity with software development life...


  • Singapore FLOWDESK ASIA PTE. LTD. Full time

    Roles & ResponsibilitiesAbout the jobAre you passionate about maintaining robust and high-performing infrastructures? Do you thrive in managing complex network environments and ensuring system reliability?Join our infrastructure team and help us elevate operational excellence to new heights.As a Site Reliability Engineer at Flowdesk, you will be at the heart...


  • Singapore HELLO PLANET PTE. LTD. Full time

    Roles & ResponsibilitiesWe are a global dating app created to give everyone a chance at love. The sense of belonging and connectedness we get from relationships helps us survive and thrive, and we're working to make it a little easier for people to find that. We're inspired by the stories we hear from employees, friends, and family who have used our app to...


  • Singapore PATSNAP PTE. LTD. Full time

    Roles & ResponsibilitiesAbout the RoleWe are looking for a skilled and experienced DevOps Engineer / Site ReliabilityEngineer (SRE) to ensure the high availability, stability, and performance of ourbusiness platform. This role will be responsible for designing and implementing scalableand maintainable DevOps architecture and automation systems to...


  • Singapore TIKTOK PTE. LTD. Full time

    Roles & ResponsibilitiesTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsAt TikTok, our people are humble, intelligent, compassionate and creative. We create...


  • Singapore TOSS-EX PTE. LTD. Full time

    Roles & ResponsibilitiesRoles & ResponsibilitiesJob PurposeThe Site Reliability Engineer (SRE) combines software development and system engineering to build and run distributed solutions in a secured multi-tier heterogeneous environment to safeguard, provide and continuously improve the software and systems behind the organization's cloud platform...


  • Singapore TOSS-EX PTE. LTD. Full time

    Roles & ResponsibilitiesRoles & ResponsibilitiesJob PurposeThe Site Reliability Engineer (SRE) combines software development and system engineering to build and run distributed solutions in a secured multi-tier heterogeneous environment to safeguard, provide and continuously improve the software and systems behind the organization’s cloud platform...


  • Singapore DBS Bank Limited Full time

    Business Function Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our...


  • Singapore SOURCEO PTE. LTD. Full time

    Roles & ResponsibilitiesRequired Expertise and ExperienceAt least 3 years of experience in SRE, DevOps, or a related engineering role. Proficiency in Infrastructure as Code (IaC) using Terraform to manage complex infrastructure. Hands-on experience with log analytics and observability tools, including ELK (Elasticsearch, Logstash, Kibana) and the Grafana...


  • Singapore GOCODE PTE. LTD. Full time

    Roles & ResponsibilitiesJob HighlightsProfessional Growth Collaborative Environment Positive Company CultureJob DescriptionCollaborate with various teams that includes Development/Infra/Products to ensure successful delivery, maintenance planning and correction of build errors. Day-to-day monitoring, backup, deployment and maintenance of systems. ...


  • Singapore TRINITY CONSULTING SERVICES PTE. LTD. Full time

    Roles & Responsibilities· Must have minimum 5 years’ experience.· Strong technical knowledge and experience in supporting enterprise-level applications.· Proficiency in troubleshooting application issues, performing log analysis, and using monitoring tools.· Experience with databases and SQL query language.· Familiarity with software development life...

  • Site engineer

    6 days ago


    Singapore VSM ENGINEERING PTE. LTD. Full time

    Roles & ResponsibilitiesDirect and oversee electrical and Elv. engineering projects at construction sites, resolving issues and ensuring that work is completed according to specifications. They balance project management and engineering tasks ranging from designing electrical plans to monitoring contractors. Electrical site engineers also ensure that plans...

  • site engineer

    6 days ago


    Singapore ENSAFE ENGINEERING PTE. LTD. Full time

    Roles & ResponsibilitiesRoles & ResponsibilitiesWork closely with and Site Manager on the day to day running of site Site Supervision of sub-contractors and workers Ensure safe work procedures control measures at site Preferably 1 - 5 years working experience in building construction industry / project site work Manage and control workers and monitor...


  • Singapore TRITON AI PTE. LTD. Full time

    Roles & ResponsibilitiesWhat's on Offer:Competitive Salary – Up to SGD 6,000 per month + AWS + Variable Bonus Work Location – Jurong Island (Transport provided at designated points) Work Schedule – Monday to Friday, 8:30 AM – 5:00 PM Career Growth – Opportunity to lead high-impact sustainability and reliability initiativesKey...

  • Site Engineer

    6 days ago


    Singapore HONG AIK ENGINEERING PTE. LTD. Full time

    Roles & ResponsibilitiesJob Description:The Site Engineer will be responsible for managing day-to-day activities on an infrastructure construction project site. They will ensure the project is executed according to plans, specifications, and safety standards, while assisting in resource management and quality control. The Site Engineer will collaborate with...

  • Senior Manager

    3 weeks ago


    Singapore STARHUB LTD. Full time

    Roles & ResponsibilitiesThe Senior Manager, Site Reliability Engineering (SRE) operations Analyst is expected to effectively incident retrospective operations and in other SRE activities in general which pertains to maintenance management that includes availability, latency, performance, change management, monitoring, capacity planning & also the solutions...