Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Senior hpc infrastructure engineer (compute system)

Sydney
Firmus Technologies
Infrastructure
Posted: 10 March
Offer description

Firmus is seeking a highly skilled and driven Kubernetes HPC Engineer to join our Software Defined Infrastructure team. In this role, you will build high-performance, fault-tolerant, and reliable infrastructure to support bare-metal provisioning, performance benchmarking, and platform validation.

You will be instrumental in ensuring the stability, performance, and continuous improvement of our complex and mission-critical bare-metal HPC GPU clusters.

Key Responsibilities

* Own the end-to-end lifecycle of AI compute systems, including GPU compute, NVSwitch, and platform firmware (BIOS, GPU, NIC, and storage devices).
* Define, maintain, and enforce supported firmware and driver compatibility matrices across hardware generations, operating systems, kernels, and AI software stacks.
* Lead firmware qualification and regression testing to ensure updates do not introduce performance degradation, instability, or compatibility issues.
* Investigate and remediate performance regressions caused by firmware, driver, or system-level changes, working closely with networking, storage, and HPC engineers.
* Collaborate to integrate firmware and performance checks into SDI tooling, enabling automated validation during provisioning, upgrades, and cluster bring‐ups.
* Produce clear technical documentation, including firmware standards, validation reports, and benchmarking results, to support operational consistency and informed decision‐making.
* Collaborate with L2 SRE engineers, site operations, and networking teams to ensure platform reliability, reproducibility, and performance.
* Support hardware bring‐up activities, including BIOS tuning, GPU topology verification, NUMA alignment, and PCIe/NVLink checks.
* Contribute to continuous improvement in cluster validation, CI/CD automation, and provisioning and testing frameworks.
* Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks that optimise AI clusters for large‐scale GPU cluster commissioning.

Skills & Experience

* Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Experience with bare‐metal cluster provisioning using tools such as Metal3, OpenStack Ironic, MaaS, xCAT, or similar.
* Hands‐on expertise with platform firmware and low‐level system components, including BIOS, BMC, GPU firmware, NIC firmware, and storage devices.
* Proven experience managing firmware and driver compatibility across operating systems, Linux kernels, and AI software stacks, with a disciplined approach to version control and validation.
* Solid understanding of GPU architecture and interconnects, including PCIe, NVLink, and GPU‐to‐GPU communication patterns.
* Demonstrated experience in performance benchmarking and validation using industry‐standard and custom tools to measure GPU, compute, storage, and interconnect performance.
* Strong Linux systems knowledge, including kernel behaviour, driver management, performance tuning, and troubleshooting at the OS and hardware boundary.
* Experience diagnosing and resolving performance regressions related to firmware, drivers, or system‐level changes in production or pre‐production environments.
* Strong automation mindset using tools such as Ansible, Helm, Terraform/OpenTofu, or equivalent.
* Understanding of firmware, BIOS, BMC/IPMI/Redfish, and low‐level system tuning.
* Proficiency in one or more programming languages such as Go, Bash, Rust, or Python.
* Excellent documentation skills with a high level of attention to detail.
* Experience participating in an on‐call rotation supporting production services.
* Proactive self‐starter with a drive for continuous technical improvement.
* Ability to understand AI compute platforms as end‐to‐end systems spanning hardware, firmware, operating systems, drivers, and workloads.
* Ability to anticipate cross‐layer impacts of changes and design solutions that optimise overall system performance and reliability.
* Proactively identifies risks related to firmware upgrades and ensures compatibility through structured validation and rollback strategies.
* Experience operating AI infrastructure at medium to large scale, with a focus on reliability, repeatability, and performance consistency.
* Strong sense of ownership and accountability for system performance and reliability.
* Comfortable operating in ambiguous, fast‐evolving environments while driving continuous improvement.

Success Metrics

* Reliable, automated firmware validation and upgrade systems and processes.
* Performance validation and optimisation.
* Improved operational efficiency.
* High‐quality documentation and effective knowledge transfer.

Location & Reporting

* Sydney, NSW or Hobart/Launceston, TAS
* Reporting to Senior Manager, Software Defined Infrastructure

Employment Basis

Full‐time

Diversity

At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.

Join us in our mission to revolutionize the AI industry through sustainable practices and cutting‐edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.

#J-18808-Ljbffr

Send an application
Create a job alert
Alert activated
Saved
Save
Similar job
Project director - energy infrastructure and grid solutions
Sydney
Ausgrid
Infrastructure
Similar job
Investment banking - infrastructure and utilities coverage banker – associate/vice president
Sydney
JPMorganChase
Infrastructure
Similar job
Nsw civil infrastructure growth & pursuits director
Sydney
Arcadis
Infrastructure
Similar jobs
Tourism jobs in Sydney
jobs Sydney
jobs New South Wales
Home > Jobs > Tourism jobs > Infrastructure jobs > Infrastructure jobs in Sydney > Senior HPC Infrastructure Engineer (Compute System)

About Jobstralia

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by job title
  • Jobs by sector
  • Jobs by company
  • Jobs by location

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobstralia - All Rights Reserved

Send an application
Create a job alert
Alert activated
Saved
Save