Back to case studies
2021-2023Product team

Resilient Data Collection Tooling

Data extraction tooling for a changing web environment with persistent anti-bot friction.

Context
High-friction web
Focus
Resilience
Approach
HTTP + JS analysis
Resilience model

Reliability comes from understanding the request flow.

Stability improved through network and JavaScript analysis, not endless retries.

1Map request flows and client-side logic
2Adapt extraction logic to drift and anti-bot friction
3Reduce firefighting and restore a predictable workflow
Role

R&D Data Collection Engineer

Stack
PythonWeb scrapingReverse engineeringPlaywrightClickHouse
Problem

Standard collection approaches kept breaking because of page drift, client-side logic, and defensive mechanisms.

Solution

Worked at the intersection of Python, HTTP, and JavaScript reverse engineering: traced request flows, adjusted extraction logic, and hardened the pipeline.

Impact

Less firefighting, more predictable extraction.

What I built
  • Reworked unstable request flows into repeatable extraction logic.
  • Improved resilience to anti-bot changes and page-structure drift.
  • Balanced delivery speed with reliability under constant external change.
What this proved
  • Data collection is an infrastructure problem as much as an extraction problem.
  • Stability comes from understanding the request model, not from brute-force retries.
Related work

More projects

2025-2026

nnzen model catalog

Solo

A live catalog with 500+ model cards that makes model research less scattered.

PythonFastAPILLM APIs
2025

Custom Agent Core with MCP

Solo

LLM core with a plugin execution layer for a developer assistant: hot reload, tool chains, and explicit context handoff.

PythonFastAPIMCP