Skip to main content

Your First Real Scrape

The quickstart fetched raw HTML. This page extracts the specific fields you actually want from a real-world, JavaScript-heavy page — the kind of task you'll do every day.

The goal

Say you want the title and price from a product page that renders content with JavaScript and sits behind an anti-bot wall. Instead of downloading the whole DOM and parsing it yourself, ask OmniScrape to do the extraction.

The request

curl -X POST https://api.omniscrape.io/v1/scrape \
-H "X-API-Key: $OMNISCRAPE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://shop.example.com/product/123",
"mode": "js_rendering",
"css_selectors": {
"title": "h1.product-title",
"price": ".price-amount",
"in_stock": ".stock-status"
},
"js_wait_selector": ".price-amount"
}'

What each field does:

  • mode: "js_rendering" spins up a real browser so JavaScript executes. See Modes.
  • css_selectors maps your output keys to CSS selectors on the page; results come back in data.css_extracted. Providing selectors is all you need — no special output format required.
  • js_wait_selector waits until .price-amount appears before extracting, so you don't grab an empty skeleton.

The response

{
"success": true,
"data": {
"css_extracted": {
"title": "Wireless Headphones X200",
"price": "$129.00",
"in_stock": "In stock"
},
"status_code": 200,
"final_url": "https://shop.example.com/product/123"
},
"metadata": {
"method_used": "js_rendering",
"elapsed_time": 6.1,
"solver_used": true,
"challenge_solved": true
},
"billing": { "charged": 0.0035, "balance_after": 49.91 }
}

You get exactly three clean strings — no HTML parsing on your side.

Doing it in Python

import os, requests

resp = requests.post(
"https://api.omniscrape.io/v1/scrape",
headers={"X-API-Key": os.environ["OMNISCRAPE_KEY"]},
json={
"url": "https://shop.example.com/product/123",
"mode": "js_rendering",
"css_selectors": {
"title": "h1.product-title",
"price": ".price-amount",
},
"js_wait_selector": ".price-amount",
},
timeout=120,
)

data = resp.json()
if data["success"]:
print(data["data"]["css_extracted"])
else:
print("Failed:", data["error"])

Where to go next