Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Reliability & observability senior analyst (x8) - data centers

Sydney
Capital Executive Search
Posted: 12 June
Offer description

We are currently partnering with a rapidly growing AI infrastructure and cloud technology organisation to appoint 8x IOC Reliability & Observability Senior Analysts to join their Sydney-based Infrastructure Operations Centre.

This business operates some of the world's largest GPU compute environments, delivering AI training and inference capabilities at scale. With significant investment in high-performance computing infrastructure and a commitment to sustainable operations, they continue to expand their global footprint while building one of the most advanced AI cloud platforms in the market.

The Role

This position sits within a 24/7 Infrastructure Operations Centre (running on 8-hour shifts, 8am or 10am start) and plays a critical role in ensuring the reliability, observability and operational performance of large-scale GPU compute environments.

Working closely with engineering, infrastructure and operations teams, you'll be responsible for improving incident detection, alert quality and operational visibility across a highly complex production environment, including:

* Perform advanced Level 2 incident analysis across GPU clusters, networking and supporting infrastructure.
* Improve alert quality, routing, enrichment and monitoring effectiveness to reduce operational noise and accelerate response times.
* Maintain operational dashboards, reliability metrics and service health reporting used by both technical teams and leadership.
* Investigate recurring incidents and identify opportunities to improve detection, automation and operational workflows.
* Analyse GPU health, performance degradation and failure patterns to support proactive incident management.
* Work with AIOps-generated insights, validating automated detections and ensuring operational signals remain accurate and actionable.
* Support incident management, RCA processes and operational reporting activities.

Desired Background

* 2-5 years of experience within IOC, NOC, Site Reliability Engineering, Production Operations, Observability or Reliability-focused environments.
* Strong understanding of incident management, service reliability and operational performance metrics such as MTTD and MTTR.
* Experience working with Linux systems, infrastructure monitoring platforms and enterprise ITSM tooling.
* Exposure to large-scale distributed environments, cloud infrastructure, HPC environments or GPU-based compute platforms.
* Hands‐on experience with observability tooling such as Splunk, Datadog or similar monitoring platforms.
* Ability to correlate logs, metrics and alerts across multiple technology domains to accelerate incident diagnosis and resolution.
* Experience improving alert quality, reducing false positives and optimising operational monitoring practices.
* Familiarity with automation, scripting or configuration‐driven operational workflows.

The Opportunity

This is an opportunity to join a business operating at the forefront of AI infrastructure, high-performance computing and large‐scale cloud operations. You'll gain exposure to cutting‐edge GPU environments, advanced observability platforms and modern reliability engineering practices while helping shape the operational maturity of a rapidly scaling global technology organisation.

P.S - This company does not offer sponsorship. We are only reviewing those with Australian Citizenship or Permanent Residency.

#J-18808-Ljbffr

Send an application
Create a job alert
Alert activated
Saved
Save
Similar jobs
jobs Sydney
jobs New South Wales
Home > Jobs > Reliability & Observability Senior Analyst (x8) - Data Centers

About Jobstralia

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by job title
  • Jobs by sector
  • Jobs by company
  • Jobs by location

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobstralia - All Rights Reserved

Send an application
Create a job alert
Alert activated
Saved
Save