Overview
Halo Labs is a future focused, end-to-end data solutions firm - transforming tomorrow, today. Our intelligent and secure technology systems and data driven solutions drive meaningful outcomes and unlock lasting value.
Why work at Halo Labs?
* Leading Innovation: We don't just solve problems; we illuminate a never-ending stream of innovation.
* Exceptional Perks: Enjoy outstanding perks like a dedicated learning budget, performance bonuses, and comprehensive wellbeing support.
* Remote-First Organisation: Experience the perks of remote work while having the flexibility to travel to client locations throughout Australia.
* Inclusive and Engaging: Celebrate diversity in a welcoming space that thrives on new ideas and open conversations, all in a respectful environment.
* Inspiring Origins: With a compelling founder story, we are a customer-focused, culture-first organisation.
About the role
* Design, develop, and maintain reusable in-house PySpark frameworks to enforce standardised data engineering patterns across the SaaS platform
* Architect and implement scalable, production-grade ETL/ELT pipelines across AWS environments
* Build distributed data processing solutions using Python and PySpark on Databricks
* Develop batch and near real-time ingestion pipelines integrating third-party clinical systems, healthcare APIs, and external enterprise platforms
* Design secure data integration patterns (REST APIs, SFTP, event-driven ingestion, webhooks) ensuring compliance and data integrity
* Work closely with Software Engineers to embed data services directly into the SaaS product architecture
* Contribute to the design of the overall solution architecture of the data platform, working closely with the software development team to ensure seamless integration between the application backend and the data layer.
* Implement CI/CD pipelines using Git for automated deployment and testing of data workloads
* Apply infrastructure-as-code and environment management best practices across AWS
* Optimise Spark jobs, cluster configurations, and storage strategies for performance and cost efficiency
* Design and maintain robust data models, including dimensional models and SaaS-oriented data schemas
* Implement data validation, monitoring, and alerting to ensure pipeline reliability and production stability
* Provide technical mentorship and enforce engineering standards across the analytics and data engineering team
About you
* Strong hands-on experience with AWS services relevant to modern data platforms (S3, Lambda, RDS, Glue, IAM, etc.)
* Advanced proficiency in Python, SQL, and PySpark for large-scale distributed data processing
* Deep experience configuring and managing Databricks clusters for scalable big data workloads
* Experience building production-ready data pipelines in a SaaS or product-led engineering environment
* Strong understanding of cloud-native data architecture, including data lakes, lakehouse architecture, and modular pipeline design
* Experience integrating with third-party systems via APIs and secure data exchange mechanisms
* Exposure to healthcare or regulated data environments, including handling sensitive data securely
* Strong knowledge of data modelling, metadata management, and data governance principles
* Experience implementing automated testing frameworks for data pipelines
* Solid understanding of DevOps practices including Git workflows, branching strategies, and CI/CD automation
* Degree in Computer Science, Engineering, Data Science, or related technical field
#J-18808-Ljbffr