Storage Site Reliability Engineer - SKA Low Telescope
CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and pay our respects to their Elders past and present.
The opportunity
The SKA Observatory (SKAO) is a next-generation radio astronomy facility that will revolutionise our understanding of the Universe and the laws of fundamental physics. Enabled by cutting-edge technology, it promises to have a major impact on society, in science and beyond.
In Australia, the SKAO is collaborating with CSIRO to operate and support the construction of the low frequency telescope (SKA-Low) in remote Western Australia on Wajarri Yamaji Country.
The SKA-Low Storage Site Reliability Engineer (SSRE) will be responsible for developing a compute storage solution for a large scale, high performance compute cluster for the SKA-Low Telescope. The solution will uphold and maintain storage platform stability, reliability and robustness.
Your duties will include :
Work with the key SKAO stakeholders to develop and maintain a storage system for a large scale (petabyte parallel filesystem), distributed high performance compute cluster.
Implement and maintain SKA Low storage platform stability, reliability and robustness.
Define, measure and refine Service Level Objectives (SLO) and corresponding Service Level Indicators (SLIs) for the storage solution.
Implement and continuously improve monitoring systems for storage system / service health and behaviour observability.
Location :
Perth, Western Australia
Tenure :
Indefinite – Full-Time, Part-Time or Job-Share
Reference :
99275
To be considered you will need :
A tertiary qualification in Computer Science, Software Engineering, or equivalent work experience.
Experience in the development, deployment and management of File, Object and Block multi-tiered storage solutions within a large scale (peta-byte parallel filesystem) clustered environment, using tools such as Lustre and / or Ceph.
Experience / Insight into high performance networking and adaptation / tuning to support large scale network storage systems in a HPC environment.
Experience in using infrastructure provisioning tools (such as Ansible, Puppet).
Ability to communicate in a professional yet friendly and effective manner, both orally and in writing, to an audience that spans a wide range of cultures and backgrounds.
Experience with cold storage / tape library solutions.
Experience implementing SRE practices and procedures to deliver and maintain reliable and robust storage systems / services.
Experience with various scripting and programming languages such as Bash and / or Python, and willing to learn new ones.
Demonstrated understanding and enthusiasm for working based on lean / agile principles.
Applications for this position are open to Australian / New Zealand Citizens, Australian Permanent Residents or you must either hold, or be able to obtain, a valid working visa for the duration of the specified term. Appointment to this role is subject to provision of a national police check and may be subject to other security / medical / character requirements.
Flexible working arrangements
We work flexibly at CSIRO, offering a range of options for how, when and where you work.
About CSIRO
At CSIRO Australia's national science agency, we solve the greatest challenges through innovative science and technology. We put the safety and wellbeing of our people above all else and earn trust everywhere because we only deal in facts. We collaborate widely and generously and deliver solutions with real impact.
How to apply
To apply for this role, please apply on-line providing your CV, and a Cover Letter clearly addressing the essential criteria of this role and your motivation for applying. Under CSIRO policy, only those who are able to demonstrate how they can meet the essential criteria may be appointed.
#J-18808-Ljbffr