Role: Data Engineer with GCP
Location: Sydney
Role Type : Contract
Job description: GCP BigData Engineering
Experience level - 7+ years total in Data Engineering; 4+ years on GCP with production systems.
Top 3 Must‑Have Skillsets
1. GCP BigData Engineering (BigQuery + Dataform + Pub/Sub)
2. Expert in designing and optimising BigQuery schemas, partitioning/clustering, cost/performance tuning, query optimisation, and policy tag integration.
3. Building streaming and batch pipelines using Apache Beam/Dataflow and Pub/Sub with exactly-once semantics, backpressure handling, and replay strategies.
4. Strong experience with Dataform (or similar) for SQL-based transformations, dependency graphs, unit tests, and multi-environment deployments.
5. Python + Orchestration (Airflow/Cloud Composer)
6. Production-grade Python for ETL/ELT, distributed processing, robust error handling, and testable modular design.
7. Designing resilient Airflow DAGs on Cloud Composer: dependency management, retries, SLAs, sensors, service accounts, and secrets.
8. Monitoring, alerting, and Cloud Logging/Stackdriver integration for end-to-end pipeline observability.
9. Data Security & Governance on GCP
10. Hands-on with Dataplex (asset management, data quality, lineage), BigQuery policy tags, Cloud IAM (least privilege, fine-grained access), KMS (key rotation, envelope encryption), and audit trails via Cloud Logging.
11. Practical experience implementing PII controls (data masking, tokenisation, attribute-based access control) and privacy-by-design in pipelines.
Good‑to‑Have Skillsets
* Cloud Run & APIs
: Building stateless microservices for data access/serving layers, implementing REST/gRPC endpoints, authentication/authorisation, rate limiting.
* Data Modelling
: Telecom-centric event models (e.g., CDRs, network telemetry, session/flow data), star/snowflake schemas, and lakehouse best practices.
* Performance Engineering
: BigQuery slot management, materialised views, BI Engine, partition pruning, cache strategies.
* Secure Source Manager (CI/CD)
: Pipeline-as-code, automated tests, artifact versioning, environment promotion, canary releases, and GitOps patterns.
* Infrastructure as Code
: Terraform/Deployment Manager for reproducible environments, IAM bindings, service accounts, KMS config, Composer environments.
* Data Quality & Testing
: Great Expectations/Deequ-like checks, schema contracts, anomaly detection, and automated data validations in CI/CD.
* Streaming Patterns
: Exactly-once delivery, idempotent sinks, watermarking, late data handling, windowing strategies.
* Observability & SRE Practices
: Metrics, logs, traces, runbooks, SLIs/SLOs for data platforms, major incident response to support DevOps.
* Cost Governance
: BigQuery cost controls, slot commitments/reservations, workload management, storage lifecycle policies.
* Domain Knowledge (Mobile Networks)
: Familiarity with 3G/4G/5G network data, OSS/BSS integrations, network KPIs, and typical analytics use cases.
Experience Level
* 7+ years total in data engineering; 4+ years on GCP with production systems.
* Evidence of impact:
* Led end-to-end delivery of large-scale pipelines (batch + streaming) with strict PII governance.
* Owned performance/cost optimisation initiatives in BigQuery/Dataflow at scale.
* Implemented CI/CD for data workflows (Secure Source Manager) including automated tests and environment promotion.
* Drove operational excellence (SLAs, incident management, RTO/RPO awareness, DR patterns).
* Soft skills: Technical leadership, code reviews, mentoring, clear documentation, cross-functional collaboration with Network/Analytics teams, and a bias for automation & reliability.