Extractor

Robust LLM-powered web data extraction in TypeScript

Extractor by Lightfeed is a TypeScript library that uses LLMs to extract structured data from websites. It handles messy HTML, JavaScript-rendered content, and inconsistent page layouts that break traditional scrapers. Define your schema and let the LLM figure out where the data lives.

Panel Reviews

The Builder

Developer Perspective

Ship

“Schema-driven extraction with LLM fallback is exactly right. Traditional scrapers break on every site redesign — Extractor adapts because it understands the content semantically. The TypeScript-first approach with strong typing on outputs is chef's kiss for building data pipelines.”

The Skeptic

Reality Check

Ship

“LLM extraction costs add up fast at scale. But for the use cases where you need it — scraping sites with unpredictable layouts, extracting from pages that change frequently — the reliability improvement over CSS selectors easily justifies the token spend.”

The Creator

Content & Design

Ship

“I have been using this to pull structured data from competitor landing pages and product directories. The schema definition is intuitive and the extraction quality is surprisingly consistent even across wildly different page designs.”