Senior data engineer

Sydney

BULL-IT SOLUTIONS LTD

Posted: 17 March

Offer description

• Proficiency with AWS SageMaker Unified Studio, including its various components Discover, Build and Govern.

• Experience with development in Sagemaker IDE and Application (Jupyterlab, Spaces and Partner AI Apps).

• Data Analysis and Integrations ( Query Editor, Visual ETL Jobs and Data Processing jobs)

• Orchestration of workflows and ML Pipelines

• Understanding and expertise to work with ML and Gen AI tools available in unified studio are ad-on values.

• Setting up projects and data governance in SageMaker Unified studio

• Framework Development: Proven ability to design, develop, and implement highly reusable and adaptable data ingestion frameworks capable of handling diverse source types (e.g., databases, APIs, message queues, file systems).

• Low-Latency Real-time: Deep expertise in building real-time data pipelines with stringent sub-second latency requirements, utilising services like AWS Kinesis (Data Streams/Firehose), Apache Kafka, or similar streaming technologies.

• Batch Processing: Experience with robust batch ingestion patterns and tools (e.g., AWS Glue, Apache Spark) for efficient processing of larger datasets.

• Data Transformation: Strong skills in designing and implementing efficient data transformation logic for both streaming and batch data.

• Programming and Scripting:

• Advanced proficiency in Python, particularly for developing scalable data ingestion and export frameworks, API integration, and extensive use of the AWS SDK (Boto3).

• Experience with performance optimisation techniques for Python applications in data-intensive environments.

• Familiarity with other relevant languages (e.g., Scala, Spark) for high-performance streaming applications is beneficial.

• AWS Services for Data and Infrastructure:

• In-depth knowledge of core AWS services: Lambda, S3, DynamoDB, CloudWatch, SQS, SNS, and API Gateway.

• Strong understanding of AWS networking (VPC, security groups, private endpoints) and IAM for secure, fine-grained access control.

• Databases (Relational and No-SQL):

§ Expertise with Amazon RDS (Relational Database Service) for both real-time data ingestion and efficient batch export. This includes optimising database performance, connection pooling, and transaction management for high-throughput, low-latency operations.

§ Proficiency with Amazon Redshift for large-scale data warehousing, including data loading strategies (e.g., COPY command), query optimisation, and managing Redshift clusters for analytical workloads and batch export.

§ Experience with No-SQL databases such as Amazon DynamoDB, MongoDB

§ for high-performance, low-latency data storage and retrieval, particularly for real-time applications and feature serving.

• Orchestration and Workflow Management:

• Experience with AWS Managed Apache Airflow (MWAA) for orchestrating complex data pipelines, scheduling batch jobs, and managing dependencies between ingestion, processing, and export tasks.

• Ability to write, deploy, and manage Airflow DAGs (Directed Acyclic Graphs) for robust workflow automation.

• Monitoring and Observability (Real-time Heartbeat Export):

• Ability to design and implement comprehensive real-time monitoring solutions, including custom metrics, detailed logging, and tracing.

• Experience with AWS CloudWatch for collecting, analysing, and acting on operational data, specifically for generating and exporting \"heartbeat\" signals to external systems or dashboards.

• Knowledge of setting up proactive alerts and automated notifications for system health, performance degradation, and data pipeline anomalies.

• Software Development Practices & Architecture:

• Strong understanding of software engineering principles, design patterns, and architectural best practices for building scalable, maintainable, and reusable data frameworks.

• Proficiency with version control systems (Git) and collaborative development workflows.

• Experience with CI/CD pipelines for automated testing, deployment, and release management of data ingestion and export solutions.

• Familiarity with Infrastructure as Code (e.g., AWS CloudFormation, Terraform) for managing and provisioning AWS resources

Key Skillset

1. Previous experience in developing framework for batch and real time data ingestions in relation and no-sql databases or filesystems.

2. Previous experience in Real time data ingestion with low latency with experience on AWS Kinesis Stream, Apache Kafka or similar streaming technologies

4. Sound knowledge and experience in building Data Warehouses and Data Lakehouses.

* 5. Data Modeling experience is a plus point
#J-18808-Ljbffr

Send an application

Create a job alert

Save