CS/CE Software Engineer (Systems Architect), Inference — Tech Lead (contract)
CS/CE Software Engineer (Systems Architect), Inference — Tech Lead (contract)
2 days ago Be among the first 25 applicants
Direct message the job poster from Monadd-AI
CEO at Monadd.ai | AICD | Structural Engineer at BAE SYSTEMS
About the Team
We bring state-of-the-art AI to users running locally on a single computer—laptopsandworkstations. Our focus is extreme performance on constrained hardware: tight memory and compute budgetswith small batches.
About the Role
We’re looking for a hands-on Tech Lead to design, optimise, and scale a single-machine inference engine. You’ll own performance from the kernelsto the APIs: memory hierarchy, storage-backed loading, quantisation, and low-latency decoding. You’ll collaborate across research and product to turn novel/experimental concepts into functional and reliable local runtimes.
In this role, you will
- Design and implement core inference infrastructure for running large LLMsefficiently on a single computer (CPU+GPU).
- Write and optimise compute kernels (e.g., CUDA/HIP/Metal) and host-side runtimes to minimise latencyand maximise throughput tokens/sec, namely pre-fill latency and decoding speed respectively
- Engineer memory across the full hierarchy (shared/global GPU DRAM CPU RAM NVMe), using caching and prefetching optimizedlayouts.
- Build storage-accelerated paths: mmap/lazy loading, NVMe streaming,and paging strategies for weights and KV caches.
- Integrate quantisation/compression (e.g., INT8 HQQ) with careful accuracy–performance trade-offs, measured via standard LLM performance evaluation sets (eg MMLU).
- Develop observability and profiling for local runs (Nsight, flamegraphs) to detect bottlenecks and guide optimization.
- Collaborate with researchers to apply novel optimization strategies to a production-ready custom inference engine.
- Mentor engineers on kernel optimisation, memory management, and single-host performance.
You may thrive in this role if you
- Have deep experience with high-performance/low-latency systems and GPU/CPU kernel optimisation.
- Understand the LLM inference stack end-to-end: model formats, loading, memory residency, KV-cache management, sampling/decoding, and host–device transfers.
- Are fluent in memory-hierarchy thinking (locality, tiling, coalescing, cache utilisation) and I/O-aware design (overlap compute with reads).
- Are comfortable squeezing performance on a single box: small-batch efficiency, CUDA graph optimization.
- Work proficientlyin C/C++ and have the ability to expose clean Python bindings.
Education :
- BS/MS/PhD in Computer Science, Computer Engineering, or equivalent practical experience; systems/HPC focus (not web-dev–centric).
Nice to have
- Experience with cluster or local engines: ktransformers, llama.cpp, Ollama, vLLM, SGLang.
- CPU vectorisation (AVX-512/AMX) and custom thread pools/schedulers.
- Speculative/look-ahead decoding, flash/paged-attention variants.
Compensation
~$50-100 USD per hour + Equity offered
Fixed term (3-6 months), PT or FT
Location :
Remote; core collaboration hours aligned to Melbourne (AEST)
How we work
We treat the engine as a whole system and measure before we modify. Every change targets a verified bottleneck, and balance contention across compute, memory, I/O, and caches to maximize throughput. Our engineers reason end-to-end—from low-level implementation and hardware limits to system-level behavior, using rigorous benchmarks and data to drive iteration. We value engineers who aren’t afraid to dive into assembly or flamegraphs when microseconds matter.
Seniority level
- Seniority level
Mid-Senior level
Employment type
- Employment type
Contract
Job function
- Job function
Engineering and Information Technology
- Industries
Desktop Computing Software Products
Referrals increase your chances of interviewing at Monadd-AI by 2x
Sign in to set job alerts for “Software Engineer” roles.
Melbourne, Victoria, Australia 1 week ago
Software Engineer, Intern (Summer 2025/2026)
Melbourne, Victoria, Australia 5 days ago
Melbourne, Victoria, Australia 11 months ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 month ago
Software Engineer - Solutions Engineering
Melbourne, Victoria, Australia 1 week ago
Richmond, Victoria, Australia 3 weeks ago
Melbourne, Victoria, Australia A$120,000.00-A$150,000.00 4 weeks ago
Software Engineer 2 - Azure Core Storage
Melbourne, Victoria, Australia 2 weeks ago
Melbourne, Victoria, Australia 4 days ago
Graduate Software Engineer, Open Source and Linux, Canonical Ubuntu
Software Engineer, Data Infrastructure & Acquisition - Melbourne, Australia
Melbourne, Victoria, Australia 2 weeks ago
Software Engineer (Python/Linux/Packaging)
Melbourne, Victoria, Australia 1 week ago
Software Engineering Specialist - Human Data
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 2 months ago
Melbourne, Victoria, Australia 2 months ago
Software Engineer - Cross-platform C++ - Multipass
Melbourne, Victoria, Australia 5 months ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 12 hours ago
Melbourne, Victoria, Australia 1 month ago
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 week ago
Senior Software Engineer, AI Model serving - Melbourne, Australia
Melbourne, Victoria, Australia 2 weeks ago
Senior Software Engineer - Site Reliability Engineering
Melbourne, Victoria, Australia 1 week ago
Melbourne, Victoria, Australia 1 month ago
Melbourne, Victoria, Australia 1 month ago
Melbourne, Victoria, Australia 1 month ago
Melbourne, Victoria, Australia 1 month ago
System Software Engineer - Golang compiler, tooling, and ecosystem
Melbourne, Victoria, Australia 1 week ago
Python and Kubernetes Software Engineer - Data, AI/ML & Analytics
Melbourne, Victoria, Australia 6 months ago
Melbourne, Victoria, Australia 1 month ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr