Output Formats

The output_format parameter decides what data.content contains. Pick the format closest to what your code consumes so you do less post-processing.

Format	Returns	Use for
`html`	Raw HTML	Custom parsing, archiving
`markdown`	Clean Markdown	LLM input, content pipelines
`plain_text`	Stripped text	Search indexing, NLP
`autoparse`	Auto-detected JSON	Quick structured data
`screenshot`	Base64 PNG	Visual capture (see Screenshots)

:::note Extraction is separate Precise field extraction is not an output_format — it's driven by parameters that work alongside any format:

Add css_selectors → results in data.css_extracted.
Add templates → results in data.template_extracted. :::

`html` (default)

Returns the page exactly as rendered, in data.content. Best when you have your own parser or need the full DOM.

{ "url": "https://example.com", "output_format": "html" }

`markdown`

Converts the main content to Markdown, dropping navigation, scripts, and styling. Ideal for feeding pages into an LLM or a content database.

{ "url": "https://blog.example.com/post", "output_format": "markdown" }

`plain_text`

Returns readable text with markup removed. Good for full-text search, keyword extraction, and sentiment analysis.

{ "url": "https://news.example.com/article", "output_format": "plain_text" }

`autoparse`

We detect the page type (product, article, listing) and return structured JSON under data.extracted_data without you writing selectors. Great for a quick start; use css_selectors when you need exact control.

{ "url": "https://shop.example.com/p/1", "output_format": "autoparse" }

Precise extraction

To pull exact fields, add css_selectors (see CSS extraction). The output_format can stay html:

{
  "url": "https://shop.example.com/p/1",
  "css_selectors": { "title": "h1", "price": ".price" }
}

tip

You can combine css_selectors with templates (e.g. ["links", "images"]) to get both your mapped fields and built-in extractions in one call.

html (default)​

markdown​

plain_text​

autoparse​

Precise extraction​

`html` (default)

`markdown`

`plain_text`

`autoparse`

Precise extraction