Site Reliability Developer 3
9 hours ago
As a Senior Network Reliability Engineer on the OCI Network Availability team, you will play a crucial role in ensuring the high availability and performance of Oracle Cloud's global network infrastructure. This role involves applying engineering methodologies to measure, monitor, and automate the reliability of OCI's network, supporting millions of users across a vast, distributed environment.
You will be part of a fast-paced, innovative team responsible for swiftly responding to network disruptions, identifying root causes, and collaborating with both internal and external stakeholders to restore services. Your work will also focus on automating daily operations, improving workflow efficiency, and optimizing network performance. With OCI's expansive global footprint, you will manage hundreds of thousands of network devices across a mix of dedicated backbone infrastructure, CLoS networks, and the internet.
Responsibilities
Support and Operate OCI's Global Network: Design, deploy, and manage large-scale network solutions that power Oracle Cloud Infrastructure (OCI), ensuring reliability and performance at a global scale.
Collaborate and Drive Change: Use best practices and tools to develop and execute network changes safely. Work closely with cross-functional teams to continuously improve network performance.
Incident Response and Troubleshooting: Lead break-fix support for network events, provide escalation for complex issues, and perform post-event root cause analysis to prevent future disruptions.
Automation and Efficiency: Create and maintain scripts to automate routine network tasks, working with business units and teams to streamline operations and increase productivity.
Mentorship and Knowledge Sharing: Guide and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence.
Network Monitoring and Performance Analysis: Collaborate with network monitoring teams to gather telemetry data, build dashboards, and set up alert rules to track network health and performance.
Vendor Collaboration: Work with network vendors and technical account teams to resolve network issues, qualify new firmware/operating systems, and ensure the network ecosystem's stability.
On-Call Support: Participate in the on-call rotation to provide after-hours support for critical network events, ensuring that operational excellence is maintained 24/7.
Experience:
Experience working in a large-scale ISP or cloud provider environment, supporting global network infrastructure.
Prior experience in a network operations role, with a proven track record of handling complex network events.
Technical Skills:
Strong proficiency in network protocols and services, including MPLS, BGP, OSPF, IS-IS, TCP/IP, IPv4/IPv6, DNS, DHCP, VxLAN, and EVPN.
Extensive experience with network automation, scripting, and data center design. Python is preferred, though expertise in other scripting or compiled languages is a plus.
Hands-on experience with network monitoring and telemetry solutions, with the ability to leverage these tools to drive improvements in network reliability.
Familiarity with network modeling and programming, including YANG, OpenConfig, and NETCONF.
Problem-Solving and Collaboration:
Ability to apply engineering principles to resolve complex network issues, collaborating across teams to deliver effective solutions.
Strong communication skills, both written and verbal, with the ability to present technical information clearly to both technical and non-technical stakeholders.
Demonstrated experience in influencing product roadmap decisions, priorities, and feature development through sound judgment and technical expertise.
What We Offer:
Impactful Work: Work on projects that influence the future of cloud technology, supporting millions of users and businesses globally.
Innovation-Driven Culture: Be part of a team that thrives on creativity, continuous learning, and pushing the boundaries of what's possible.
Career Growth: We're committed to your professional development and offer opportunities to expand your skills and take on new challenges.
Collaborative Environment: Join a diverse, supportive team where autonomy and innovation are encouraged, and your contributions are valued.
Additional Information:
This role involves participation in an on-call rotation, providing 24/7 support for critical network events and incidents.
You will have the opportunity to work in a highly dynamic environment with exposure to cutting-edge technologies and large-scale cloud infrastructure.
Qualifications
Career Level - IC3
-
Site Reliability Developer 3
2 weeks ago
Singapore Oracle Full timeOverview Join to apply for the Site Reliability Developer 3role at Oracle . Job Description As a Senior Network Reliability Engineer on the OCI Network Availability team, you will play a crucial role in ensuring the high availability and performance of Oracle Cloud's global network infrastructure. This role involves applying engineering methodologies to...
-
Site Reliability Developer 3
9 hours ago
Singapore Oracle Full time $120,000 - $180,000 per yearDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. As a Senior Network Reliability Engineer on the OCI Network Availability team, you will play a crucial role in ensuring the high availability and performance of Oracle Cloud's global network infrastructure. This role involves...
-
Site Reliability Developer 3
9 hours ago
Singapore Oracle Full time $120,000 - $180,000 per yearDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. As a Senior Network Reliability Engineer on the OCI Network Availability team, you will play a crucial role in ensuring the high availability and performance of Oracle Cloud's global network infrastructure. This role involves...
-
Site Reliability Developer 3
1 week ago
Singapore Ll Oefentherapie Full timeAs a Senior Network Reliability Engineer on the OCI Network Availability team, you will play a crucial role in ensuring the high availability and performance of Oracle Cloud's global network infrastructure. This role involves applying engineering methodologies to measure, monitor, and automate the reliability of OCI's network, supporting millions of users...
-
Site Reliability Engineer
1 week ago
Singapore M2R System Technology Pte. Ltd. Full time**Responsibilities**: - Run production environment by monitoring availability and taking a holistic view of the system health - Achieve site reliability automation, minimize system downtime, and reduce site reliability cost - Manage risks and resolves issues that affect the release scope, schedule and quality - Suggest architecture improvements, push for...
-
Site Reliability Developer 3
10 hours ago
Singapore Oracle Full time $120,000 - $180,000 per yearDescription At Oracle Cloud Infrastructure (OCI), we're building the future of cloud technology for enterprises. As a team of innovative, diverse creators and engineers, we operate with the agility of a startup, but the scale and customer-first mindset of the leading enterprise software company in the world. We thrive on equity, inclusion, and respect for...
-
Site Reliability Engineer
1 week ago
Singapore Rapsys Technologies Full time**Experience**: 4+ Years **Location**: Changi, Singapore **Roles and Responsibilities**: 2. Set up and operate the server infrastructure and software (Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, Nginx) based on bank’s security standards and industry’s security standards. 3. Perform continuous improvement for the platform covering areas...
-
Site Reliability Engineer
2 weeks ago
Singapore Pan Asia Group Resources Full time**Key Responsibilities**: - Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services - Drive optimise-operate initiative, example, reduction of operation toil - Work with enterprise team in deploying SRE enablers/initiatives. - Strong background in machine learning and deep learning algorithms. -...
-
Site Reliability Engineer
1 day ago
Singapore Retentia technology private limited Full time**3+ years of experience in Site Reliability Engineering, DevOps**, or a related field. - **Strong knowledge of cloud platforms (AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes).** - Experience with automation and configuration management tools (e.g., T**erraform, Ansible, Chef, or Puppet).** - Proficiency in at least **one programming...
-
Site Reliability Engineer
7 days ago
Singapore eTeam Full timeDescription Site Reliability Engineer (SRE) We are looking for a seasoned Site Reliability Engineer (SRE) with 5–10 years of experience to join our Platform Engineering team. This role is ideal for someone who thrives in a fast‑paced environment, is passionate about reliability, and enjoys solving complex challenges. You will play a key role in building...