Sr. Site Reliability Engineer

5 days ago


Singapore Visa Full time

Company Description

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive.

When you join Visa, you join a culture of purpose and belonging - where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world - helping unlock financial access to enable the future of money movement.

**Join Visa: A Network Working for Everyone.**

**Job Description**:
Product Reliability Engineering(PRE) is part of the Visa's technology organization. The division is responsible for maintaining and supporting Visa's data assets and provides support for value added products and services to drive innovation for our partners and clients, within Visa and globally. Product Reliability Engineering Big Data Platform Team is part of PRE supports open source Big Data and Kafka clusters in Visa.

As a Senior Big data Engineer you will be responsible for monitoring, troubleshooting, automating and continuously developing software tools to improve the availability and resiliency of open source Big Data Platforms at Visa. In this hands-on role, you will Administer and ensure performance, reliability and increase the operational efficiency of open source big data platforms.

Key Responsibilities:
Person will be responsible to Perform Big Data Administration and Engineering activities on multiple opensource Hadoop, Kafka, HBase and Spark clusters
Strong Troubleshooting and debugging skills.
Cross-team teamwork, build and maintain relationships with the customer teams, the user community, architects, and engineering teams, jointly work on key deliverables ensuring production scalability and stability
Effective Root cause analysis of major production incidents and developing learning documentation.
Identify and implement HA solution for services with SPOF.
Plan and perform capacity expansion and upgrades in timely manner avoiding any scaling issues and bugs.
Automation of repetitive tasks to reduce manual effort and avoid Human errors.
Tune alerting and setup observability to proactively identify the issues and performance problems.
Work closely with L-3 teams in reviewing new use cases, cluster hardening techniques for building a robust and reliable platforms.
leverage devops tools, disciplines( Incident, problem and change management) and standards in day to operations.
Ensure the Hadoop platform can effectively meet performance and SLA requirements.
Perform security remediation, automation and selfheal as per the requirement.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

**Qualifications**:
Basic Qualifications:
2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience
Hands on experience working as a Hadoop system engineer in managing Hadoop platforms.
Experience in building, managing and tuning performance of Hadoop platforms.
Extensive knowledge on Hadoop eco-system such as Zookeeper, HDFS, Yarn, HIVE and SPARK.
Excellent Shell, Python programming skills for automation requirement for repetitive dev-ops tasks
Understanding of security tools like Kerberos and Ranger.
Experience on Hortonworks distribution or Open Source preferred.
Knowledge on Kafka, HBASE and Kubernetes is a plus.
understanding of Linux, networking, CPU, memory and storage.
Knowledge on Java and Python is good to have.
Excellent interpersonal, verbal, and written communication skills.
This position is not ideal for a Hadoop developer.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.



  • Singapore Visa Full time

    **Company Description** Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and...


  • Singapore Micron Full time

    **Our vision is to transform how the world uses information to enrich life for all.** Join an inclusive team passionate about one thing: using their expertise in the relentless pursuit of innovation for customers and partners. The solutions we build help make everything from virtual reality experiences to breakthroughs in neural networks possible. We do it...


  • Singapore DHATCH CONSULTANCY PTE. LTD. Full time

    Site Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...


  • Singapore eTeam Full time

    Description Site Reliability Engineer (SRE) We are looking for a seasoned Site Reliability Engineer (SRE) with 5–10 years of experience to join our Platform Engineering team. This role is ideal for someone who thrives in a fast‑paced environment, is passionate about reliability, and enjoys solving complex challenges. You will play a key role in building...


  • Singapore eTeam Full time

    Description Site Reliability Engineer (SRE)We are looking for a seasoned Site Reliability Engineer (SRE) with 5–10 years of experience to join our Platform Engineering team. This role is ideal for someone who thrives in a fast‐paced environment, is passionate about reliability, and enjoys solving complex challenges. You will play a key role in building...


  • Singapore ETEAM WORKFORCE PTE. LTD. Full time

    Roles & Responsibilities Position: Site Reliability Engineer (SRE) Work Mode - Onsite/HybridTiming - 9am to 6 pm Duration – 1 Year (Highly extendable)Salary: 6018 SGD Work Location: Robinson Road, Singapore Job Description About the RoleWe are looking for a seasoned Site Reliability Engineer (SRE) with 5+ years of experience to join our Platform...


  • Singapore ETEAM WORKFORCE PTE. LTD. Full time

    Position: Site Reliability Engineer (SRE) Work Mode - Onsite/Hybrid Timing - 9am to 6 pm Duration – 1 Year (Highly extendable) Salary: 6018 SGD Work Location: Robinson Road, Singapore About the Role We are looking for a seasoned Site Reliability Engineer (SRE) with 5+ years of experience to join our Platform Engineering team. This role is ideal for someone...


  • Singapore NTT Data Singapore Full time $120,000 - $200,000 per year

    As a Site Reliability Engineer you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault tolerant and designed to scale. You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime. Key...


  • Singapore eTeam Full time

    Direct message the job poster from eTeam Are you passionate about reliability, performance, and scalability? Join our dynamic engineering team and help build robust systems that power innovation! Site Reliability Engineer (SRE) Budget: Up to SGD 6,000/month Experience: 5–10 years Key Responsibilities Design, build, and maintain scalable, reliable...


  • Singapore ABAXX SINGAPORE PTE. LTD. Full time

    Site Reliability Engineer - Networking We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house. This role is ideal for someone with a strong foundation in AWS services, infrastructure as code, and cloud security, who is passionate about building scalable, secure,...