VP of Site Reliability Engineering
3 days ago
Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our multiple banking delivery channels.
Job Objective
DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform's efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team's productivity.
Roles and Responsibilities
- Develop monitoring and onboarding guidelines for various applications using observability platform stack, ensuring accurate monitoring and data collection.
- Drive Observability standards, best practices, operations and processes for the Enterprise in AppDynamics & other observability tools
- Automate routine tasks and reporting processes using APIs and scripting, reducing manual effort and improving efficiency in AppDynamics & other observability tools
- Identify and resolve performance issues through detailed analysis of transaction traces, application logs, and system metrics.
- Collaborate with stakeholders to define performance metrics and monitoring requirements aligned with business goals.
- Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
- Design and implement monitoring solutions to track application performance, identifying bottlenecks and optimising system efficiency.
- Conduct performance tuning and capacity planning to ensure applications meet scalability and reliability requirements.
- Develop custom dashboards and reports to provide actionable insights and drive decision-making processes.
- Collaborate with development and operations teams to integrate Observability platform stack with CI/CD pipelines and other DevOps tools.
- Configure and fine-tune alerts to proactively detect and address performance issues before they impact end-users.
- Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
- Work with application teams to develop long-term monitoring strategies that align with business goals and technology roadmaps.
- Create data retention polices and access controls (RBAC) to manage user permissions.
- Perform application maintenance, patching, upgrading controller versions, agents etc and ensure EOS/EOL is maintained.
- Ensure on-time delivery of tasks and projects.
- Ensure continuous uptime of applications and services.
- Ensure no security or audit issues.
- Comply to bank standards to track and follow up on the assigned projects.
- Cover all areas in application and infrastructure operations of the platform.
- You should be a university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
- Strong communication skills and ability to explain protocol and processes with team and management
- A passion for learning and using new technologies in the open-source communities.
- A passion for coding.
- Min 10 years of IT work experience.
- Working knowledge in AppDynamics, ELK Stack, Grafana, Open Telemetry (OTEL)
- In-depth experience in Unix/Linux/Shell/Python scripting with quality, scalability, and extensibility.
- Experience in triaging and troubleshooting application problems quickly in monitoring tools by using various techniques - Transaction snapshots, Diagnostic Sessions, Data Collectors
- Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
- Knowledge in Confluent Kafka, Prometheus & other APM tools (Dynatrace, Datadog, New Relic, Splunk) is a plus.
- Knowledge in AI/ML capabilities to automate RCA's and shorter MTTR when issues arise.
- Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of and DNS
- Ability to contribute to discussions on design and strategy.
- Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development.
- Good problem diagnosis and creative problem-solving skills
- Experience in NodeJS, Spring boot could be a plus.
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.
-
Site Reliability Engineer Leader
6 days ago
Singapore OCBC Full timeJob Description:We are seeking a Site Reliability Engineer Leader to join our team at OCBC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our infrastructure. This role requires strong expertise in automating releases, continuous integration/delivery systems, and relevant infrastructure...
-
Site Reliability Engineer
3 weeks ago
Singapore COMBUILDER PTE LTD Full timeRoles & ResponsibilitiesWe are seeking talented and driven professionals to join our Site Reliability Engineering (SRE) team. This role involves helping organizations enhance the availability, performance, and resilience of their applications and services through the deployment and administration of Observability Platforms.Key ResponsibilitiesDeploy and...
-
Site Reliability Engineer
3 weeks ago
Singapore FUNFLY PTE. LTD. Full timeRoles & ResponsibilitiesPosition OverviewAs a site reliability engineer, you will be responsible for ensuring the smooth operation of game services by maintaining, monitoring, and responding to faults daily. They will develop automation tools to enhance operational efficiency and manage game servers for optimal performance. The role includes collaborating...
-
Senior Site Reliability Engineer
3 weeks ago
Singapore GK CONSULTING PTE. LTD. Full timeRoles & ResponsibilitiesWe're seeking an experienced Senior Site Reliability Engineer to ensure the reliability, availability, and performance of our cloud-based internet services.Key Responsibilities1. Own reliability, availability, and user experience for assigned cloud services2. Develop and implement service governance initiatives to increase reliability...
-
Site Reliability Engineer
3 weeks ago
Singapore TRINITY CONSULTING SERVICES PTE. LTD. Full timeRoles & Responsibilities· Must have minimum 5 years' experience.· Strong technical knowledge and experience in supporting enterprise-level applications.· Proficiency in troubleshooting application issues, performing log analysis, and using monitoring tools.· Experience with databases and SQL query language.· Familiarity with software development life...
-
Site Reliability Engineer
2 weeks ago
Singapore FLOWDESK ASIA PTE. LTD. Full timeRoles & ResponsibilitiesAbout the jobAre you passionate about maintaining robust and high-performing infrastructures? Do you thrive in managing complex network environments and ensuring system reliability?Join our infrastructure team and help us elevate operational excellence to new heights.As a Site Reliability Engineer at Flowdesk, you will be at the heart...
-
Site Reliability Engineer
2 weeks ago
Singapore HELLO PLANET PTE. LTD. Full timeRoles & ResponsibilitiesWe are a global dating app created to give everyone a chance at love. The sense of belonging and connectedness we get from relationships helps us survive and thrive, and we're working to make it a little easier for people to find that. We're inspired by the stories we hear from employees, friends, and family who have used our app to...
-
Site Reliability Engineer
2 weeks ago
Singapore PATSNAP PTE. LTD. Full timeRoles & ResponsibilitiesAbout the RoleWe are looking for a skilled and experienced DevOps Engineer / Site ReliabilityEngineer (SRE) to ensure the high availability, stability, and performance of ourbusiness platform. This role will be responsible for designing and implementing scalableand maintainable DevOps architecture and automation systems to...
-
Site Reliability Engineer
1 week ago
Singapore TIKTOK PTE. LTD. Full timeRoles & ResponsibilitiesTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsAt TikTok, our people are humble, intelligent, compassionate and creative. We create...
-
GEL – Site Reliability Engineer
3 weeks ago
Singapore TOSS-EX PTE. LTD. Full timeRoles & ResponsibilitiesRoles & ResponsibilitiesJob PurposeThe Site Reliability Engineer (SRE) combines software development and system engineering to build and run distributed solutions in a secured multi-tier heterogeneous environment to safeguard, provide and continuously improve the software and systems behind the organization's cloud platform...
-
GEL – Site Reliability Engineer
3 weeks ago
Singapore TOSS-EX PTE. LTD. Full timeRoles & ResponsibilitiesRoles & ResponsibilitiesJob PurposeThe Site Reliability Engineer (SRE) combines software development and system engineering to build and run distributed solutions in a secured multi-tier heterogeneous environment to safeguard, provide and continuously improve the software and systems behind the organization’s cloud platform...
-
Associate Site Reliability Engineer
3 days ago
Singapore DBS Bank Limited Full timeBusiness Function Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Technology, we manage the majority of the Bank's processes and inspire to delight our business partners through our...
-
Site Reliability Engineer
3 weeks ago
Singapore SOURCEO PTE. LTD. Full timeRoles & ResponsibilitiesRequired Expertise and ExperienceAt least 3 years of experience in SRE, DevOps, or a related engineering role. Proficiency in Infrastructure as Code (IaC) using Terraform to manage complex infrastructure. Hands-on experience with log analytics and observability tools, including ELK (Elasticsearch, Logstash, Kibana) and the Grafana...
-
Site Reliability Engineer
3 weeks ago
Singapore GOCODE PTE. LTD. Full timeRoles & ResponsibilitiesJob HighlightsProfessional Growth Collaborative Environment Positive Company CultureJob DescriptionCollaborate with various teams that includes Development/Infra/Products to ensure successful delivery, maintenance planning and correction of build errors. Day-to-day monitoring, backup, deployment and maintenance of systems. ...
-
Site Reliability Engineer
3 weeks ago
Singapore TRINITY CONSULTING SERVICES PTE. LTD. Full timeRoles & Responsibilities· Must have minimum 5 years’ experience.· Strong technical knowledge and experience in supporting enterprise-level applications.· Proficiency in troubleshooting application issues, performing log analysis, and using monitoring tools.· Experience with databases and SQL query language.· Familiarity with software development life...
-
Site engineer
6 days ago
Singapore VSM ENGINEERING PTE. LTD. Full timeRoles & ResponsibilitiesDirect and oversee electrical and Elv. engineering projects at construction sites, resolving issues and ensuring that work is completed according to specifications. They balance project management and engineering tasks ranging from designing electrical plans to monitoring contractors. Electrical site engineers also ensure that plans...
-
site engineer
6 days ago
Singapore ENSAFE ENGINEERING PTE. LTD. Full timeRoles & ResponsibilitiesRoles & ResponsibilitiesWork closely with and Site Manager on the day to day running of site Site Supervision of sub-contractors and workers Ensure safe work procedures control measures at site Preferably 1 - 5 years working experience in building construction industry / project site work Manage and control workers and monitor...
-
Reliability and Sustainability Engineer
2 weeks ago
Singapore TRITON AI PTE. LTD. Full timeRoles & ResponsibilitiesWhat's on Offer:Competitive Salary – Up to SGD 6,000 per month + AWS + Variable Bonus Work Location – Jurong Island (Transport provided at designated points) Work Schedule – Monday to Friday, 8:30 AM – 5:00 PM Career Growth – Opportunity to lead high-impact sustainability and reliability initiativesKey...
-
Site Engineer
6 days ago
Singapore HONG AIK ENGINEERING PTE. LTD. Full timeRoles & ResponsibilitiesJob Description:The Site Engineer will be responsible for managing day-to-day activities on an infrastructure construction project site. They will ensure the project is executed according to plans, specifications, and safety standards, while assisting in resource management and quality control. The Site Engineer will collaborate with...
-
Senior Manager
3 weeks ago
Singapore STARHUB LTD. Full timeRoles & ResponsibilitiesThe Senior Manager, Site Reliability Engineering (SRE) operations Analyst is expected to effectively incident retrospective operations and in other SRE activities in general which pertains to maintenance management that includes availability, latency, performance, change management, monitoring, capacity planning & also the solutions...