**Job Title:**
**Job Summary:**
We are seeking an experienced engineer to join our team in building and scaling large AI compute clusters. This is an exciting opportunity for senior engineers who have a passion for distributed computing and AI training infrastructure.
Key Responsibilities:
The ideal candidate will have strong systems engineering skills with experience in distributed computing and storage for AI workloads. Proficiency in GPU cluster management, including NVIDIA GPUs, Slurm, and Kubernetes is required. Additionally, the candidate should have a deep understanding of distributed training frameworks and multi-cloud architectures (AWS, GCP, Azure, and emerging GPU clouds). Experience managing large-scale clusters, including team leadership, hiring, and scaling operations is also necessary.
What We Offer:
Top-spec Macbook + separate GPU cluster dev environments for each engineer
Weekly cash bonus when you work out 3+ times a week
Comprehensive health benefits, including a choice of Kaiser, Aetna OAMC, and HDHP (HSA-eligible) plans
Highest in the world 20 year exercise window for options
About Us:
We move fast. We ship weekly—new features, improvements, and fixes go live fast. We test big. Every month, we stress test with large groups of users face to face, get real-world feedback, and iterate rapidly. Engineers may travel between SF and Sydney to run events and meet with clients.