Healthcare Data Engineer Needed
Job Overview
We are seeking a skilled and experienced Data Engineer to develop our cloud-based machine learning platform for medical devices.
This role requires in-depth knowledge of cloud-native data architectures, healthcare data standards, and regulatory requirements. The successful candidate will collaborate closely with data scientists, ML engineers, software developers, and other team members to design and implement data pipelines that handle multimodal healthcare data within a regulated environment.
Key Responsibilities:
* Design scalable and secure data pipelines for ingesting, transforming, and storing multimodal healthcare data including imaging (e.g., DICOM), structured EHR data, unstructured clinical notes, and outcomes.
* Develop and maintain data lake and warehouse architectures using modern cloud technologies.
* Ensure data privacy, integrity, and compliance with healthcare regulations such as HIPAA, GDPR, and FDA guidelines.
* Collaborate with ML and software engineering teams to enable model training, deployment, and monitoring workflows.
* Develop tools for data validation, lineage tracking, and auditability.
* Support ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes, metadata management, and data cataloging.
* Contribute to documentation, code reviews, and agile development practices.
Requirements:
Bachelor's or master's degree in computer science, engineering, or related field.
5+ years of experience in data engineering, preferably in healthcare or regulated industries.
Desirable Skills:
* Proficiency in Python, SQL, and data pipeline frameworks (e.g., Data Version Control, Apache Airflow).
* Experience with cloud platforms (AWS, Azure, or GCP) and cloud-native data services (Docker and Containerization).
* Strong understanding of data modeling, schema design, and data governance.
* Familiarity with ML workflows and tools (e.g., TensorFlow, PyTorch, MLflow, DVC).
* Experience handling PHI/PII and sensitive healthcare data securely and compliantly.
* Experience with medical imaging formats (e.g., DICOM) and healthcare data standards (e.g., FHIR, HL7), EHR/EMR systems.
* Knowledge of DevOps practices, CI/CD pipelines, and infrastructure-as-code (preference for Pulumi), cloud applications (preference for AWS).
* Exposure to real-time data streaming (e.g., Kafka, Kinesis).
* Understanding of security and privacy best practices in medical data handling.
* Excellent communication skills, adaptability, and ability to work across a multidisciplinary team.