Site reliability engineer (sre/ devops) - engineering productivity - sydney

Sydney

Arista Networks

Posted: 5 June

Offer description

Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity - Sydney

Full-time position.

Arista Networks is a leading network provider that focuses on cloud computing and software‑defined networking.

Who You'll Work With

As part of the Engineering Productivity team, you will collaborate with software engineers to design, build, and operate secure, scalable, fault‑tolerant systems in a hybrid cloud environment.

What You'll Do

* Build, deploy, and operate critical production systems with focus on scalability, reliability, observability, performance, and security.
* Monitor, support, and enhance developer experience across services.
* Build automation to reduce toil and efficiently operate production systems.
* Proactively monitor, respond to, and enhance alerts and set up automated alert handling.
* Create and maintain incident response runbooks.
* Triage platform infrastructure issues and support software engineers in triages; engage with 3rd‑party vendor support.
* Write post‑mortem documents and build solutions to avoid incidents from repeating.
* Plan and communicate maintenance windows on production systems.
* Work with product development teams to identify bottlenecks in their workflows and design/implement solutions.
* Survey and adopt best practices to maintain secure, scalable, and fault‑tolerant systems.
* Study the design and implementation details of OSS systems for better triage and fix resolution.

Qualifications

* At least a BSc in Computer Science or Engineering (or equivalent) with 3 years of experience, OR an MS with 3 years of experience.
* Knowledge of one or more of Go, Python, and shell scripting for implementing medium‑complexity automation workflows.
* Proficient in Linux (or Unix) administration and debugging.
* Hands‑on experience operating software systems at scale.
* Experience in server provisioning (especially storage and networking).
* Strong problem‑solving and software troubleshooting skills.
* Experience with infrastructure‑as‑code.

Desired Skills

* Experience with Docker and virtualization technologies (e.g., kvm, qemu, kata‑containers).
* Experience managing Elasticsearch clusters.
* Experience managing Artifactory and Docker registries.
* Experience with infrastructure‑as‑code frameworks like Ansible.
* Experience managing large Java applications.
* Experience in storage infrastructure management (NAS, SAN, Ceph).

Legal and Work Eligibility

Only candidates with Australian citizenship, permanent residency, or legal entitlement to work in Australia for the duration of employment will be considered.

#J-18808-Ljbffr

Send an application

Create a job alert

Save