Sr. Site Reliability Engineer

7 days ago


Singapore Visa Full time

Company Description

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive.

When you join Visa, you join a culture of purpose and belonging - where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world - helping unlock financial access to enable the future of money movement.

**Join Visa: A Network Working for Everyone.**

**Job Description**:
Product Reliability Engineering(PRE) is part of the Visa's technology organization. The division is responsible for maintaining and supporting Visa's data assets and provides support for value added products and services to drive innovation for our partners and clients, within Visa and globally. Product Reliability Engineering Big Data Platform Team is part of PRE supports open source Big Data and Kafka clusters in Visa.

As a Senior Big data Engineer you will be responsible for monitoring, troubleshooting, automating and continuously developing software tools to improve the availability and resiliency of open source Big Data Platforms at Visa. In this hands-on role, you will Administer and ensure performance, reliability and increase the operational efficiency of open source big data platforms.

Key Responsibilities:
Person will be responsible to Perform Big Data Administration and Engineering activities on multiple opensource Hadoop, Kafka, HBase and Spark clusters
Strong Troubleshooting and debugging skills.
Cross-team teamwork, build and maintain relationships with the customer teams, the user community, architects, and engineering teams, jointly work on key deliverables ensuring production scalability and stability
Effective Root cause analysis of major production incidents and developing learning documentation.
Identify and implement HA solution for services with SPOF.
Plan and perform capacity expansion and upgrades in timely manner avoiding any scaling issues and bugs.
Automation of repetitive tasks to reduce manual effort and avoid Human errors.
Tune alerting and setup observability to proactively identify the issues and performance problems.
Work closely with L-3 teams in reviewing new use cases, cluster hardening techniques for building a robust and reliable platforms.
leverage devops tools, disciplines( Incident, problem and change management) and standards in day to operations.
Ensure the Hadoop platform can effectively meet performance and SLA requirements.
Perform security remediation, automation and selfheal as per the requirement.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

**Qualifications**:
Basic Qualifications:
2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience
Hands on experience working as a Hadoop system engineer in managing Hadoop platforms.
Experience in building, managing and tuning performance of Hadoop platforms.
Extensive knowledge on Hadoop eco-system such as Zookeeper, HDFS, Yarn, HIVE and SPARK.
Excellent Shell, Python programming skills for automation requirement for repetitive dev-ops tasks
Understanding of security tools like Kerberos and Ranger.
Experience on Hortonworks distribution or Open Source preferred.
Knowledge on Kafka, HBASE and Kubernetes is a plus.
understanding of Linux, networking, CPU, memory and storage.
Knowledge on Java and Python is good to have.
Excellent interpersonal, verbal, and written communication skills.
This position is not ideal for a Hadoop developer.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.



  • Singapore Tencent Full time

    Join to apply for the Senior Site Reliability Engineer role at Tencent 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability Engineer role at Tencent Business Unit Tencent Games was established in 2003. We are a leading global platform for game development, operations and publishing, and the largest online game community...


  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. PurposeThis role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. PurposeThis role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore IDEMIA Full time

    Join to apply for the Site Reliability Engineer role at IDEMIA Join to apply for the Site Reliability Engineer role at IDEMIA Get AI-powered advice on this job and more exclusive features. Purpose This role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and...


  • Singapore Visa Full time

    **Company Description** Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and...


  • Singapore Micron Full time

    **Our vision is to transform how the world uses information to enrich life for all.** Join an inclusive team passionate about one thing: using their expertise in the relentless pursuit of innovation for customers and partners. The solutions we build help make everything from virtual reality experiences to breakthroughs in neural networks possible. We do it...


  • Singapore beBeeSiteReliability Full time $90,000 - $120,000

    Unlock Your Full Potential in Site Reliability EngineeringAbout the RoleThis is an exciting opportunity to work with a global banking institution, leveraging your skills in production management and site reliability engineering to drive business growth.Develop and implement proactive, predictive models for shift production management using SRE...


  • Singapore beBeeSiteReliability Full time

    Unlock Your Full Potential in Site Reliability Engineering About the Role This is an exciting opportunity to work with a global banking institution, leveraging your skills in production management and site reliability engineering to drive business growth. Develop and implement proactive, predictive models for shift production management using SRE...


  • Singapore DHATCH CONSULTANCY PTE. LTD. Full time

    Site Reliability Engineer: **Preferred Qualifications** - 3+ years of experience in site reliability engineering, DevOps, or software engineering roles. - Proven skills in: - Monitoring & alerting tools (Grafana, New Relic) - CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.) - Container orchestration (Docker, Kubernetes) - Infrastructure-as-code...


  • Singapore HCLTech Full time

    Get AI-powered advice on this job and more exclusive features. This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey. As a Site Reliability Engineer you will be filling a...