Job Details
Location:
Sydney (4 days in office, 1-day WFH)
Reports to:
Technical Operations Director, APAC
Department:
Global Technical Operations
Salary:
$165,000 + super + annual bonus
The Opportunity A leading music organisation is now growing their Global Technical Operations hub in Sydney and looking for a Service Reliability Engineer (SRE) to join their team.
This is more than a traditional ops role – it’s an opportunity to bring a software engineering mindset to reliability, automation, and scalability in a global, high-impact environment.
What You’ll Do You’ll join a collaborative, hands‑on team responsible for the stability, performance, and scalability of global platforms. Working closely with development, infrastructure, and security teams, you’ll help build a resilient environment that keeps music flowing – from studio tools to streaming systems.
Design and maintain
high-availability, high-performance systems for global applications.
Automate everything
– from infrastructure provisioning to deployment and scaling – using tools like Terraform, Ansible, and Python.
Build robust monitoring and observability
frameworks with AWS CloudWatch, Dynatrace, Prometheus, Grafana, or Splunk.
Optimize CI/CD pipelines
to improve reliability and deployment speed.
Participate in on‑call rotations, troubleshoot incidents, and lead post‑incident reviews.
Champion SRE principles
– embed SLOs, SLIs, and error budgets into everyday engineering.
Collaborate across
Dev, Infra, and Security
teams to create a culture of continuous improvement and reliability.
About You You’re a technically strong and level‑headed engineer who loves automation, thrives in complex environments, and knows how to balance pragmatism with perfection.
Background in
systems administration (Linux/Windows)
in a large-scale environment.
Proficient in at least one programming language ( Python, Go, or Java ).
Hands‑on experience with
AWS
(GCP or Azure a bonus).
Deep understanding of
networking, containers (Docker/Kubernetes), and Infrastructure as Code
(Terraform, Ansible).
Experience with
monitoring and observability tools
such as Dynatrace, Prometheus, Grafana, or Datadog.
Calm, collaborative communicator with strong analytical and problem‑solving skills.
Bonus Points For
Experience with
ServiceNow
or ITIL processes.
Knowledge of
chaos engineering, resilience testing, or advanced capacity planning.
Previous experience managing distributed, global systems in production.
Global collaboration and career growth opportunities
Interested? Apply now or contact
Sophia Parrelli
at Talent International for a confidential chat.
#J-18808-Ljbffr