Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Data extraction engineer ( web scraping )

Perth
PST.AG
Web
Posted: 5 March
Offer description

Role Overview:

Data Extraction Engineer designs extraction systems (and not just scripts). They build and maintain a next-generation data acquisition platform that treats web scraping as a declarative, specification-driven discipline. Instead of hard-coding XPaths for every site, Web Scraping Developer defines what data is needed—using schemas, natural language descriptions, or visual blueprints—and lets intelligent pipelines figure out how to get it.

Key Responsibilities:

Specification-Driven Extraction Engineering-

Design and maintain declarative extraction specifications—using Pydantic models, JSON schemas, or domain-specific languages—that describe exactly which fields to capture, their types, and validation rules.Implement pipelines that translate these specifications into executable extraction plans, leveraging both classical (Scrapy, Playwright) and AI-augmented (LLM-based semantic parsing) backends.Build reusable specification libraries for recurring data types (product prices, tariff codes, regulatory texts) to accelerate onboarding of new sources.

Autonomous & Self-Healing Systems-

Deploy self-healing spiders that automatically detect website layout changes and repair themselves using Model Context Protocol (MCP) servers (e.g., Scrapy MCP Server, Playwright MCP).Integrate semantic extraction (Scrapy-LLM, custom LLM pipelines) to eliminate selector brittleness—spiders rely on field descriptions, not fragile XPaths.Orchestrate complex, multi-step browsing workflows with agentic frameworks (BMAD/TEA, AutoGPT-like agents) that reason about page state, adapt to anti-bot measures, and correct their own behaviour in real time.Platform Thinking & Reusability-

Move beyond one-off scrapers: build a component-based extraction platform where selectors, login handlers, and pagination logic are shared, versioned, and tested.Implement monitoring, alerting, and automatic rollback for failed extraction runs.Champion ethical crawling by design—rate limiting, robots.txt respect, and compliance with GDPR/CCPA are built into the specification layer, not retrofitted.

Collaboration & Continuous Innovation-

Partner with data scientists and domain experts to refine extraction specifications for complex, unstructured domains (e.g., legal texts, tariff classifications).Evaluate and pilot emerging tools to push automation coverage beyond 90%.Document and evangelise specification-driven best practices across the engineering organisation.Candidate Profile:

Education and Experience -

Bachelor's degree in Computer Science3+ years of experience in web scraping or data extraction

Skills and competences-

Specification-Driven Extraction – Experience defining extraction requirements via schemas (Pydantic, JSON Schema) and executing them through both traditional crawlers and LLM-based semantic parsers.Self‐Healing & Semantic Extraction – Hands‐on use of Scrapy‐LLM, Scrapy MCP Server, or similar systems that decouple field definitions from page structure.Agentic Workflows – Familiarity with frameworks that give LLMs browser control (Playwright + MCP, BMAD/TEA) to handle complex, non‐deterministic crawling tasks.Classical Scraping Fundamentals – You still know how to write a Scrapy spider or a Playwright script when needed, but you actively seek to replace that work with reusable, specification-driven components.Data Validation & Storage – Ability to define validation rules within specifications and land clean data into SQL/NoSQL databases or data lakes.Python proficiency: the focus is on an extraction engineer who happens to use Python.HTTP, DOM, XPath, CSS.Basic API integration and authentication flows.

Preferred / Nice-to-Have Skills:

Contributions to open-source scraping or AI-automation projects.Experience training or fine-tuning small LLMs for domain-specific extraction.Familiarity with data privacy engineering (GDPR, CCPA) baked into specification design.DevOps light – Docker, CI/CD for testing extraction specifications.

Mindset & Approach (Non-Negotiable):

Strong belief that the future of scraping is declarative, not imperative. You'd rather write a schema that says "extract the price" than debug an XPath when a website redesigns.Looking to shift from "code that scrapes" to "systems that understand extraction".

Send an application
Create a job alert
Alert activated
Saved
Save
Similar job
Web & digital projects leader — ai-driven seo
Perth
EY
Web
Similar job
Senior web engineer - remote, open-source innovator
Perth
Canonical
Web
Similar job
Head of web & mobile app delivery | flexible leadership
Perth
at
Web
Similar jobs
IT and Tech jobs in Perth
jobs Perth
jobs Western Australia
Home > Jobs > IT and Tech jobs > Web jobs > Web jobs in Perth > Data Extraction Engineer ( Web Scraping )

About Jobstralia

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by job title
  • Jobs by sector
  • Jobs by company
  • Jobs by location

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobstralia - All Rights Reserved

Send an application
Create a job alert
Alert activated
Saved
Save