Skip to content
00Scraping · ETL · AI-enriched pipelines

Turn the messy web into clean rows you can act on.

We build scrapers, ingestion pipelines and AI classifiers that pull signal from competitor sites, marketplaces, public records, partner feeds and unstructured email — and land it in your CRM, data warehouse or spreadsheet in the shape your team actually uses.
01What we build

End-to-end. Not scraper-of-the-week scripts.

Built to run quietly in the background and not break every Tuesday.

Resilient scrapers

Playwright + proxy rotation + CAPTCHA strategies. Handles SPAs, infinite scroll, cookie walls, login-gated pages — ethically, respecting robots.txt and rate limits.

ETL pipelines

Airbyte, dbt and custom Python. Raw → staged → clean → enriched. Airflow or Temporal for scheduling, retries, backfills.

AI enrichment

Classify, summarise, score, extract 20 structured fields from one free-text field. Claude / GPT / local models depending on sensitivity.

Real-time + batch

Near-real-time when seconds matter (price monitoring), batch when it doesn't. Same pipeline, two modes.

Compliance-first

GDPR and AVG aware. We document what's collected, where it's stored, and who sees it. Anonymisation and PII-redaction built in.

Integrated output

CRM (HubSpot, Salesforce), data warehouse (BigQuery, Snowflake, Postgres), or Google Sheets + email digest — whatever your ops actually uses.

02In production now

Scrapers quietly running for years.

Select examples — names anonymised where needed.
01

Funda listings → CRM

Real-estate — scrapes new listings matching investor criteria every 15 min, enriches with property data, pushes to CRM.

02

Competitor pricing watch

Daily price monitoring across 80 competitors with AI classifier flagging 'meaningful changes' vs noise.

03

Partner catalog ingestion

Partner PDFs, CSVs, FTP drops normalised into a unified product schema — with image optimisation and category inference.

04

Government register feeds

KvK, BAG, kadaster, EU open data — ingested, joined, and surfaced in the client's internal tools.

05

Lead scraping + enrichment

Public directory → Apollo / Clearbit enrichment → AI-scored for fit → slack alert for hot leads. Fully compliant.

06

Review aggregation

Google, Trustpilot, sector-specific sites → sentiment + theme extraction → weekly exec digest.

03Process

Two weeks to a pipeline. Years of uptime.

Fixed scope, with a monitoring retainer after launch.
01

Audit + schema

We pick sources, test accessibility and TOS, design the target schema, and write the compliance note.

02

Build + test

Scrapers + ETL + enrichment end-to-end. Run against real data, diff against manual samples, tune thresholds.

03

Deploy + monitor

Ship to your infra (or ours), set up alerts for breakage, 30 days on-call. Optional monthly monitoring retainer.

04Stack

Battle-tested. Observable.

Every run logged. Every failure alerted.
01
Playwright
Browser automation
02
Scrapy
HTTP scraping
03
Python
ETL · enrichment
04
Airbyte
ELT connectors
05
dbt
Data transformations
06
Temporal
Durable workflows
07
Claude
Classification · extraction
08
PostgreSQL
Warehouse · dw
FAQFAQ

Common questions.

Straight answers — ask us anything not listed.
For public data, typically yes — with limits. We check each site's robots.txt and Terms, respect rate limits, never bypass paywalls, and avoid personal data beyond what's publicly listed. For grey zones we'll tell you straight and document the risk.

Got a source you need in rows? Tell us about it.

Twenty minutes, video call. You leave with a plan — whether you hire us or not.

Engagement standard
  • Duration3 weeks · fixed scope
  • LanguagesNL + EN
  • PricingOn request
  • Response< 4h weekdays