Site reliability engineer (sre/ devops) - engineering productivity - sydney

Sydney

Arista Networks

Posted: 11 February

Offer description

Site Reliability Engineer (SRE/DevOps) – Engineering Productivity – Sydney

Join Arista Networks' Engineering Productivity (EngProd) team to design, build, and operate secure, scalable, and fault‐tolerant infrastructure in a hybrid cloud environment.

What You'll Do
* Build, deploy safely and incrementally and operate critical production systems with focus on scalability, reliability, observability, performance and security.
* Monitor, support and enhance developer experience across services.
* Build automation to remove toil and efficiently operate production systems.
* Proactively monitor, respond to, and enhance alerts and set up automated alert handling.
* Create and maintain incident response runbooks.
* Triage platform/infrastructure issues and help Arista software engineers; engage with 3rd‐party vendor support.
* Write post‐mortem documents and build solutions to prevent incident recurrence.
* Plan and communicate maintenance windows on production systems.
* Work with product development teams to identify and resolve infrastructural bottlenecks.
* Survey and adopt best practices around infrastructure/platform for secure, scalable, fault‐tolerant systems.
* Study the design and implementation details of OSS systems for better triage and fix resolution.
Qualifications
* At least BSc in Computer Science or Engineering + 3 years of experience, or equivalent.
* Knowledge of Go, Python, or shell scripting for automation workflows.
* Experience with Linux (UNIX) administration and debugging.
* Hands‐on experience operating infrastructure at scale.
* Server provisioning experience, especially with storage and networking.
* Strong problem‐solving and software troubleshooting skills.
* Experience with infrastructure‐as‐code (e.g., Ansible).
Desired Skills
* Managing databases (mariadb, postgres, mongodb).
* Docker and virtualization (kvm, qemu, kata‐containers).
* Monitoring stack (Prometheus, Loki, Tempo, InfluxDB, Grafana, Thanos).
* ElasticSearch cluster management.
* Artifactory, docker registry management.
* CI/CD systems (ArgoCD, Spinnaker).
* Version control (Perforce, Gerrit).
* Infrastructure‐as‐code frameworks (Ansible).
* Large Java application management.
* Storage infrastructure (NAS, SAN, Ceph).
Additional Information

Please note: We are not engaging external recruiters for this role. Only direct applications will be considered.

Australian Work Rights

Only candidates with Australian Citizenship, Australian Permanent Residency, or another demonstrable legal entitlement to work in Australia for the duration of employment, will be considered for this role.

Employment Details

Location: Sydney, New South Wales, Australia

Type: Full‐time

Salary: A$150,000.00 – A$170,000.00

#J-18808-Ljbffr

Send an application

Create a job alert

Save