Are you a seasoned engineering leader with a passion for building world‑class observability across complex, high‑scale systems? We're looking for a Principal Observability Engineer to drive the strategy, standards, and technical excellence behind our organisation's next‑generation observability and AIOps capabilities.
In this role, you'll shape the future direction of our observability platforms, partner with cross‑functional teams across engineering and operations, and enable the business to proactively detect issues, optimise performance, and deliver exceptional reliability for mission‑critical systems.
What You'll Do
Lead the design, development, and rollout of advanced observability and AIOps platforms—spanning metrics, logs, traces, dashboards, and alerting.
Own and evolve the observability technology roadmap aligned with organisational priorities.
Define and champion standards, frameworks, and best practices across engineering teams.
Continuously optimise system performance and ensure observability practices stay current with modern industry trends.
Architect scalable, resilient distributed systems to support high‑traffic, complex workloads.
Implement ML‑driven capabilities for anomaly detection, forecasting, and root‑cause analysis.
Manage large volumes of telemetry data while maintaining security, quality, and compliance.
Identify automation opportunities and develop intelligent auto‑remediation workflows.
You'll have:
Vast experience across software engineering, DevOps, SRE, or platform operations.
Strong hands‑on experience with observability ecosystems (e.g., Dynatrace, Sumo Logic, OpenTelemetry).
Proven ability to build and manage large‑scale observability platforms using ML and LLM‑based tooling.
Expertise in cloud monitoring, scalable telemetry pipelines, and distributed systems.
Proficiency with Kubernetes, Docker, Harness, and microservices architectures.
Deep understanding of enterprise‑scale cloud infrastructure, networking, and multi‑layer observability.
Demonstrated use of ML/GenAI for predictive monitoring, incident analysis, correlation, and summarisation.
Experience building auto‑healing workflows (e.g., using Ansible).
If you'd like an opportunity to lead the observability strategy at enterprise level, please apply now.