Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Machine learning engineer - distributed ml systems

Brisbane
Pluralis Research
Posted: 18 March
Offer description

Overview

Pluralis Research carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.

We're looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large-scale training. You'll be implementing a novel substrate for training distributed ML models that work under consumer grade internet connection.

Responsibilities

Distributed Training Architecture & Optimization

* Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions.
* Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.
* Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.
* Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.
* Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.

Decentralized Networking & Resilience

* Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.
* Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes.
* Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.
* Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

What You'll Bring

* Strong experience building and operating distributed systems in production.
* Hands‐on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar).
* Deep understanding of model parallelism (data, tensor, pipeline parallelism).
* Expert‐level Python with production experience (concurrency, error handling, retry logic, clean architecture).
* Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination.
* Experience optimizing GPU workloads, memory management, and large‐scale compute efficiency.

What we offer

* Equity‐heavy compensation with meaningful ownership in a mission‐driven company
* Competitive base salary for senior engineering roles in Australia
* Visa sponsorship available for exceptional candidates
* Remote‐first with optional access to our Melbourne hub
* World‐class team — team mates were previously at at Google, Amazon, Microsoft, and leading startups

Backed by Union Square Ventures and other tier‐1 investors, we're a world‐class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.

#J-18808-Ljbffr

Send an application
Create a job alert
Alert activated
Saved
Save
Similar jobs
jobs Brisbane
jobs Queensland
Home > Jobs > Machine Learning Engineer - Distributed ML Systems

About Jobstralia

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by job title
  • Jobs by sector
  • Jobs by company
  • Jobs by location

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobstralia - All Rights Reserved

Send an application
Create a job alert
Alert activated
Saved
Save