Job Description:
We are seeking an experienced HPC Systems Administrator to join our dedicated on-site IT team. The successful candidate will be responsible for maintaining, enhancing and scaling our High-Performance Computing (HPC) environment.
* Be responsible for the administration and maintenance of HPC clusters, including compute nodes, storage systems, and networking
* Monitor system performance and ensure high availability of HPC resources
* Lead the deployment, configuration, and troubleshooting of HPC tooling (e.g. Slurm)
* Automate routine tasks using scripting languages (e.g. Bash, Python, Ansible)
* Collaborate with the IT Manager to plan and implement improvements to the HPC infrastructure
* Build and maintain documentation for systems, processes, and configurations
* Provide sound technical support to users of the HPC environment
About the Role:
This is an outstanding opportunity for an experienced Linux Systems Administrator to play a key role in ensuring the reliability and efficiency of our HPC systems. You will work closely with our IT Manager to achieve this goal.
Required Skills and Qualifications:
* Proven experience as a Systems Administrator (or similar role), managing Linux HPC clusters with Slurm, optimising performance, reliability, and resource utilisation for high-throughput computing workloads
* Strong scripting and automation skills, including Bash, Python and Ansible
* Strong experience in a Red Hat environment
* Previous hands-on experience managing HPC systems and tooling (e.g. job schedulers (Slurm), containerisation (such as Singularity or Docker), and parallel filesystems)
* Experience with monitoring tools and performance tuning (such as Prometheus)
* Excellent understanding of networking concepts and protocols
* Experience in managing back-up solutions and disaster recovery planning
* Experience with Synopsys EDA tools, or similar electronic design automation environments
* Excellent problem-solving and communication skills, with the ability to respond flexibly in a changing environment
* Able to work both independently and collaboratively in a fast-paced environment
* A strong commitment to delivering a stable, high-performance service that operates reliably to support users and workloads
Benefits:
* A comprehensive benefits package, including an annual bonus plan, private medical insurance, life insurance, and a contributory pension scheme
* Equity, so that our team can share in the long-term success of our company
* 28 days annual leave, plus bank holidays and enhanced family leave
* A diverse work environment that brings together experts in many fields (including software and hardware development, quantum information theory, physics and maths) and over 20 different nationalities
* A learning environment that encourages individual, team and company growth and development, including a regular programme of learning events and training and conference budgets
How to Apply:
Please upload a CV and covering letter by clicking 'Apply Now'. We review CVs as we receive them and interview as soon as we have applications that look like a good match.