We're seeking an experienced
Senior Data Engineer
to help design, build, and optimise scalable, real-time data platforms on AWS. This role focuses on streaming data solutions using
Spark Structured Streaming
,
Kafka
, and modern cloud-native data services. You'll contribute to core platform capabilities including Iceberg-based data lakes, slowly changing dimensions, and automated deployment pipelines.
Key Responsibilities
* Develop and support real-time streaming pipelines using
Apache Spark Structured Streaming
integrated with
Kafka
* Ingest, parse, and transform complex data sources, including
XML
, as well as JSON and CSV formats
* Optimise Spark workloads using advanced techniques such as
partitioning, broadcast joins, and caching
* Implement
SCD Type 1 and Type 2
patterns within analytical and warehouse data models
* Design and manage end-to-end ETL orchestration using AWS-native services, including:
* AWS Glue
(event-driven and scheduled jobs)
* Lambda
,
EventBridge
, and
Step Functions
* Work across the AWS data ecosystem, leveraging:
* S3
and
Athena
for data lake storage and querying
* DynamoDB
for low-latency reference data
* Glue
for scalable, serverless Spark processing
* Build and maintain
Apache Iceberg
tables, handling schema evolution, compaction, snapshots, and time travel
* Write and maintain automated
unit and integration tests
for PySpark pipelines using
PyTest
* Develop and test Glue workloads locally using
Docker
and PySpark
* Implement and manage CI/CD pipelines for data workloads using
Jenkins, GitHub, and JFrog Artifactory
* Partner closely with solution architects and business stakeholders to deliver reliable, scalable, and cost-effective data solutions
* Work within
Databricks
environments, including Jobs, Notebooks, Runtime, and Delta/Iceberg integrations
Skills & Experience
* Strong hands-on experience with
Spark Structured Streaming, Kafka, and PySpark
* Proven background working with
Databricks
(Jobs, Notebooks, Runtime, Delta/Iceberg)
* Demonstrated expertise in
Spark performance tuning and optimisation
* Solid experience processing and transforming
XML
data
* In-depth knowledge of AWS data services, including
Glue, S3, Athena, Lambda, and DynamoDB
* Practical experience with
Apache Iceberg internals
(metadata, compaction, schema evolution)
* Strong experience building Spark pipelines on
AWS Glue
* Understanding of cluster sizing, cost optimisation, and performance tuning across Glue and Databricks
* Experience running and testing Glue jobs locally using
Docker
* Strong testing mindset with hands-on
PyTest
usage
* Experience implementing
SCD Type 1 & Type 2
data patterns
* Familiarity with CI/CD tooling such as
Jenkins, GitHub, and JFrog
* Strong communication skills with a structured, analytical approach to problem-solving
Nice to Have
* Background in
Banking or Financial Services
environments
* Exposure to
data governance, lineage, or regulatory frameworks
* Experience with
Terraform
,
Airflow
, or advanced Databricks platform features