Sydney, NSW, Australia (Hybrid)
Full Time
About the job
We're hiring an AI Researcher to drive advanced research in neural network and LLM optimization, identify the most promising opportunities, and translate them into production-ready innovations. You will evaluate and apply emerging techniques (pruning, quantization, inference acceleration, memory optimization, scheduling, runtime tuning) and determine what will materially improve LLM inference performance, model efficiency, and cost—then partner with engineering to ship the changes that make a measurable difference.
Responsibilities:
* Inference & Compute Optimization:Design and implement highly optimized inference pipelines and computational kernels to accelerate LLM and neural network workloads, leveraging low-level techniques such as SIMD vectorization, cache-aware memory access patterns, and hardware-specific tuning.
* Neural Network Compression & Model Optimization:Research and implement pruning, quantization, and other compression techniques to reduce model size and accelerate inference while preserving accuracy. Apply both in-training and post-training optimization methods across LLM and vision model workloads.
* Profiling & Observability:Build and utilize advanced profiling tools to identify bottlenecks across the inference and training stack—from memory bandwidth and cache utilization to CPU-side data preprocessing stalls and end-to-end pipeline throughput.
* Evaluation & Benchmarking:Design and maintain rigorous evaluation and benchmarking frameworks for systematic model comparison across optimization configurations. Develop automated pipelines (e.g., LLM-as-a-judge) to measure the impact of optimization techniques on model quality and performance.
* Mentorship:Act as a technical lead for engineers and researchers, fostering a culture of high-performance code, rigorous benchmarking, and research-to-production excellence. Drive team growth, technical interviews, and cross-functional collaboration.
Required Qualifications:
* Deep Systems Expertise:8+ years of experience in high-performance computing, AI systems, or low-level software optimization. Deep familiarity with performance-critical development including CPU/GPU architecture, memory hierarchies, SIMD/vectorization, and profiling-driven tuning.
* LLM & NN Optimization Track Record:Proven experience optimizing neural networks and LLMs through techniques such as pruning, quantization, and inference acceleration, with a demonstrated path from research to production deployment.
* Communication:Ability to translate complex systems-level constraints and optimization trade-offs into actionable research directions for modeling and engineering teams.
* Experience building evaluation frameworks, ML observability, or developer tools that help researchers understand and compare model performance across optimization configurations.
* A history of working on neural network compression, inference acceleration, or applied AI research problems that required bridging algorithmic research with high-performance implementation.
* Patent authorship or published research in AI/ML optimization.
* Experience with C/C++ inference engines, x86 intrinsics, or similar low-level performance work is a strong plus.
About the Company
Glasswing, an emerging force in technological efficiency, is transforming application performance through Kubernetes optimization with the mission to reduce cloud cost, improve application efficiency while identifying the most impactful bottlenecks. Becoming part of our team means you will be at the forefront of devising a solution that ensures seamless scalability and robust system performance, keeping user satisfaction high as businesses grow.
Our early-stage startup, backed by leading global venture capital, We welcome innovative team members who are passionate about making a substantial impact in a dynamic and evolving technological landscape.
#J-18808-Ljbffr