Job Title: AI Infrastructure Architect
This is a senior-level position that requires a unique combination of skills in Dev Ops leadership, software engineering, and MLOps expertise.
As an AI infrastructure architect, you will be responsible for designing, implementing, and maintaining large-scale AI/ML platforms and infrastructure.
Key Responsibilities:
* AI Platform & Dev Ops (60%):
o Architect, deploy, and maintain GPU-accelerated Kubernetes clusters using Helm, NGC containers, and custom K8s operators.
o Build and maintain CI/CD pipelines to enable continuous delivery of both software and models.
o Automate infrastructure across AWS, Azure, and on-prem environments using Terraform or Pulumi.
* Model Lifecycle & MLOps:
o Collaborate with data scientists to containerize, benchmark, and tune LLMs, diffusion models, and multimodal pipelines.
o Implement data governance and tracking for AI data pipelines.
o Maintain feature and vector stores, ensuring reproducibility and performance of AI applications.
* Hands-on Engineering:
o Develop backend services and APIs in Python and C++ and optionally in Type Script.
o Integrate components from the client's digital human ecosystem.
o Build reusable SDKs, CLI tools, and internal libraries to accelerate AI/ML workflows across teams.
Required Qualifications:
* 10+ years of experience building and operating production-grade software systems.
* 2+ years focused specifically on AI/ML platforms or infrastructure.
* Proven expertise in CI/CD, Git Ops, Terraform, and Helm.
* Strong Kubernetes and Docker experience, including GPU workload scheduling.
* Advanced Linux administration skills and experience profiling GPU workloads.
* Expert-level Python plus one systems language.
Split your time approximately as follows: 60% Dev Ops/AI infrastructure, 30% backend coding, and 10% MLOps/model lifecycle tasks. You will be working with a variety of technologies including Helm, NGC containers, Kubernetes, GitHub Actions, Jenkins, Argo CD, Terraform, Pulumi, CUDA, Triton, Tensor RT-LLM, Riva, Tokkio, Maxine, Omniverse, and more.