Job Opportunity:
We are seeking a skilled software development engineer to lead the design and implementation of a next-generation model serving infrastructure. This innovative project focuses on developing large-scale generative AI applications efficiently on AWS silicon.
Key Responsibilities:
* Design and architect distributed machine learning serving systems optimized for generative AI workloads
* Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem
* Develop and optimize scalable solutions for both offline and online inference workloads
* Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and Triton
* Mentor team members and provide technical leadership across multiple work streams
* Collaborate with customers and engineering teams to define technical strategy
Requirements:
* Proficiency in software development and architecture principles
* Experience with large-scale distributed systems and machine learning frameworks
* Strong understanding of cloud-based services and AWS silicon
* Excellent communication and collaboration skills
About the Team:
The Neuron Serving team is committed to delivering high-quality, efficient, and reliable model serving solutions. We value innovation, teamwork, and continuous improvement in our work processes.