Job Summary
\
The primary goal of this position is to implement proactive improvements in the performance, reliability, and developer experience of the platform products and services across Kubernetes-based infrastructure running on-premise and in AWS/GCP environments.
\
Key Responsibilities
\
* Design, implement, and maintain developer-friendly tools to improve productivity, code quality, and deployment efficiency for Kubernetes-based workloads.
* Identify bottlenecks in integration and deployment pipelines and implement enhancements to support faster, more reliable deployments to on-premise and cloud Kubernetes clusters.
* Collaborate with development teams to enable self-service tooling for managing deployments, logs, and infrastructure resources in Kubernetes environments.
* Continuously improve build, test, and deployment automation for Kubernetes infrastructure across on-premise and cloud environments (AWS/GCP).
* Provide better visibility into Kubernetes environments through improved observability tools, dashboards, and metrics.
* Manage and improve Kubernetes orchestration across on-premise infrastructure and AWS/GCP clusters to ensure reliability, scalability, and consistency.
* Enhance observability by implementing robust monitoring, logging, and alerting solutions tailored to Kubernetes workloads using tools like Grafana, Loki or cloud-native tools like CloudWatch (AWS) and Stackdriver (GCP).
* Collaborate with Engineering Leadership to implement reliability engineering practices such as load testing, chaos testing, and recovery mechanisms for Kubernetes services.
\
Required Skills and Qualifications
\
* Bachelor's or Master's degree in Computer Science, Engineering, or related field.
* 2+ years of experience in software development, systems engineering, or a related field.
* Experience with automation tools like Ansible, Terraform, Helm, and ArgoCD.
* Proficiency in at least one programming language such as Go, Python, or Java.
* Knowledge of container platforms like Kubernetes.
* Expertise in observability tools like Grafana, Prometheus, and New Relic.
* Experience with cloud-native technologies and Linux fundamentals.
\
Benefits
\
* Salary continuance insurance.
* Additional 5 days of leave per year (conditions apply).
* NEP Travel benefits & discounts including Qantas Club Membership.
* Discounts through Employment Hero Work app.
* Employee Assistance Programme.
\
About Our Process
\
We are committed to employing individuals who align with our values and meet the requirements of the role. As part of the recruitment process, there may be several checks conducted to demonstrate applicants' suitability for a role.