Senior Manger

2 months ago


Singapur, Singapore StarHub Full time

Job Description

The Senior Manager, Site Reliability Engineering (SRE) operations Analyst is expected to effectively incident retrospective operations and in other SRE activities in general which pertains to maintenance management that includes availability, latency, performance, change management, monitoring, capacity planning & also the solutions offered derived from emergency response.

Key Accountabilities

• Effective facilitation & conduct incident retrospectives (RCA) activities from end to end
• Absorb new technology rapidly & apply effectively
• Evaluate & demonstrate new cloud technologies as required
• Communicate well with technical & non-technical colleagues
• Mentoring of other colleagues, as necessary
• Work to a high standard with agreed timescales
• Undertake any other tasks or duties that are reasonable & requested by your supervisor or a member of the senior management team
• Code reviewing
• Ability to apply knowledge in supporting "Run" operations
• Perform data analysis & provide suggestion on identifying Service Level Indicators & Service Level Objectives

Responsibilities

• Responsible for effectively facilitating the Problem Management Process
• Able to demonstrate authority in the RCA calls while coordinating with other stakeholders & solve the discrepancy in blameless ways
• Responsible for efficient allocation of time & resources given parallel major incidents and problem activities
• Point of contact for assigned incidents of higher severity (from incident retrospective calls all the way up to management report) documentation and publishing 
• Manage the updates of systems such as problem management module, internal sharepoint, etc.
• Proposes & participates on the enhancement activities related to SRE
• Collaborates with Engineering Teams within organization and with LOBs on enabling activities as part of the preventive measures
• Develop event management process, metrics, and governance model
• Perform trend analysis on events to identify potential issues/incidents
• Consolidate and analyze noise alerts to zoom in on actual issues
• Leverage GenAl and real-time data feeds to produce post-incident reports
• Implement automatic root cause identification, reducing turnaround for RCA Reports
• Coordinate & Automate incident thematic and trend analysis using AI/ML
• Identify event/incident clustering for improvements

Qualifications

Requirements

Skills & Experience:

• Degree in IT, Computer Science or related field
• Minimum 10 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander
• In depth understanding of Public/Private/Hybrid cloud solutions
• Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.
• Good exposure to logging & monitoring tools like Splunk, Dynatrace, Prometheus, Grafana, ELF/ELK
• Good understanding of cloud native technologies like Containers, Kubernetes etc.
• Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools
• In depth understanding of Incident & Problem Management functions & activities
• Good understanding of Identity and access management
• Software incident & problem management
• Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents
• Identify recurring system/ application issues & work with cloud team, infra teams, product development, vendors & other stakeholders in investigating & resolving cause
• Maintain accurate documentation of incidents including impact details, timelines, steps taken for mitigation/resolution
• Strong verbal & written communication skills particularly effective documentation skills
• Prior experience in developing and implementing event management processes and governance models
• Strong analytical skills with the ability to interpret complex data sets
• Proficiency in event management tools and platforms
• Familiarity with ITIL (Information Technology Infrastructure Library) practices related to Incident Management, Problem Management, Change Management and Event management
• Experience with AI/ML technologies and their application in incident analysis

Desirable

• Min 6+ yrs of software development or technical support or operations experience
• Experience with Jira, Confluence
• Basic knowledge of Linux/ Windows
• Exposure to Enterprise databases e.g Oracle, SQL server, Maria DB, MongoDB & Sybase
• Knowledge in systems & multi-tier application & network troubleshooting
• Experience with load balancing principles
• Essential knowledge & awareness of Public/Private/Hybrid cloud solutions
• Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELG/ELK
• Preferred ITIL V4 certification
• Trend Analysis and Forecasting
• Process Development and Governance
• Familiarity with GenAl (Generic Algorithm) or similar technologies
• Continuous Improvement Mindset


  • Senior Manger

    1 month ago


    Singapur, Singapore Preqin Full time

    The Account Executive will be responsible for identifying, developing, and closing new subscription agreements with some of the largest and most reputable firms in the market, as well as, managing and growing existing client relationships.     In this role, you will be responsible for strategic planning and account management across a designated sales...


  • Singapur, Singapore A*STAR Full time

    A*STAR is spearheading a series of Strategic Research & Translational Thrust (SRTT) that harnesses its multidisciplinary expertise to undertake a focused research mission. Operating as a cross-A*STAR initiative, SRTTs are geared at delivering impact and outcomes in specific thematic areas. Led by the A*STAR Research Office (RO), SRTTs will be actively...


  • Singapur, Singapore Parkway Pantai Limited Full time

    The Role Provide Nursing leadership, management and direction in regards to setting and maintaining quality standards, staff development and resource management in daily operation of the unit. The Nurse Manager has an overall accountability for the standard of patient care, appropriate matching of nurse competence to patient need, occupational health...


  • Singapur, Singapore Amazon Asia-Pacific Holdings Private Limited Full time

    Amazon is looking for a highly skilled and analytical Senior Program Manager, with a passion for making an impact through innovation and delivering solutions at scale for Singapore retail organization. This role offers a unique opportunity to support our growing business while driving large scale, high visibility projects with substantial internal and...

  • Senior Executive

    1 month ago


    Singapur, Singapore Securities Investors Association (Singapore) Full time

    Liaise with listed companies and investor relations companies on planning and executing a full suite of corporate engagement initiatives, which include organising dialogue sessions, pre-AGMs / EGMs, fireside chats, corporate connect and related events Engage sponsors; listed companies which have sponsored their shareholders and employees for SIAS Associate...


  • Singapur, Singapore Kuok Group Full time

    This position is for PaxOcean Group, a company under Kuok (Singapore) Limited.Tasks and ResponsibilitiesResponsible for: (The main job activities)To provide necessary support to PaxOcean Group HSE for all mattersFunction as the appointed Fire Safety Manger (FSM) for the yard and ensure all requirements as per Fire Code are being complied withTaking lead as...


  • Singapur, Singapore United Overseas Bank Full time

    Vice President, Real Estate Portfolio & Strategy, Corporate Real Estate Services Posting Date: 01-Jun-2023 Location: Singapore (City Area), Singapore, 048624 Company: United Overseas Bank Ltd About UOB United Overseas Bank Limited (UOB) is a leading bank in Asia with a global network of more than 500 branches and offices in 19 countries...