* Proficiency with AWS SageMaker Unified Studio, including its various components Discover, Build and Govern
* Experience with development in SageMaker IDE and Application (Jupyterlab, Spaces and Partner AI Apps)
* Data Analysis and Integrations (Query Editor, Visual ETL Jobs and Data Processing jobs)
* Orchestration of workflows and ML Pipelines
* Understanding and expertise to work with ML and Gen AI tools available in unified studio is add-on value
* Framework Development: Proven ability to design, develop, and implement highly reusable and adaptable data ingestion frameworks capable of handling diverse source types (e.g., databases, APIs, message queues, file systems)
* Low‐Latency Real‐time: Deep expertise in building real‐time data pipelines with stringent sub‐second latency requirements, utilising services like AWS Kinesis (Data Streams/Firehose), Apache Kafka, or similar streaming technology
* Batch Processing: Experience with robust batch ingestion patterns and tools (e.g., AWS Glue, Apache Spark) for efficient processing of larger datasets
* Programming and Scripting: Advanced proficiency in Python, particularly for developing scalable data ingestion and export frameworks, API integration, and extensive use of the AWS SDK (Boto3). Experience with performance optimisation techniques for Python applications in data‐intensive environments
* Familiarity with other relevant languages (e.g., Scala, Spark) for high‐performance streaming applications is beneficial
* AWS Services for Data and Infrastructure: In‐depth knowledge of core AWS services: Lambda, S3, DynamoDB, CloudWatch, SQS, SNS, and API Gateway
* Strong understanding of AWS networking (VPC, security groups, private endpoints) and IAM for secure, fine‐grained access control
* Databases (Relational and No‐SQL): Expertise with Amazon RDS for both real‐time data ingestion and efficient batch export. This includes optimising database performance, connection pooling, and transaction management for high‐throughput, low‐latency operation. Proficiency with Amazon Redshift for large‐scale data warehousing, including data loading strategies (e.g., COPY command), query optimisation, and managing Redshift clusters for analytical workloads and batch export
* Experience with No‐SQL databases such as Amazon DynamoDB, MongoDB
* Strong understanding of software engineering principles, design patterns, and architectural best practices for building scalable, maintainable, and reusable data framework
* Proficiency with version control systems (Git) and collaborative development workflow
* Experience with CI/CD pipelines for automated testing, deployment, and release management of data ingestion and export solution
* Familiarity with Infrastructure as Code (e.g., AWS CloudFormation, Terraform) for managing and provisioning AWS resources
Key Skills
* Previous experience in developing framework for batch and real‐time data ingestions in relational and No‐SQL databases or file systems
* Previous experience in real‐time data ingestion with low latency using AWS Kinesis Stream, Apache Kafka or similar streaming technologies
* Sound knowledge and experience in building Data Warehouses and Data Lakes
* Data Modeling experience is a plus
#J-18808-Ljbffr