Best Practices

A short checklist of habits that keep your scraping reliable, fast, and inexpensive.

Start cheap, escalate only when needed

Use mode: "auto". It tries the cheap, fast path first and only spins up a browser when a page is blocked or needs JavaScript. Pin js_rendering only for sites you know require a browser, and fast only for static content where you want the lowest cost. See Modes.

Ask for the format you actually need

Don't download full HTML and parse it if you only want three fields. Add css_selectors and get back exactly those values. Want clean text for an LLM? Use output_format: "markdown". Less data transferred means faster, cheaper calls.

Wait for content, not for time

On JavaScript-heavy pages, use js_wait_selector instead of guessing a fixed delay. It returns the moment your target element exists, and only waits the full js_wait_timeout when something is wrong.

Match concurrency to your plan

Run a worker pool sized to your plan's concurrency limit. Going over just produces 429s and wasted retries. See Rate limits.

For pagination, logins, or multi-step flows, reuse one session_id so the same IP and cookies persist. For unrelated URLs, don't share a session — fresh IPs spread load.

Geo-target when the data is regional

Prices, availability, and language change by country. Set proxy: "residential:<cc>" to collect the exact regional view. See Proxies.

Handle both layers of status

A 200 from OmniScrape means we processed your request; the target's own status is in data.status_code. Always check both before trusting the content. See Errors.

r = resp.json()
if not r["success"]:
    handle_api_error(r["error"])
elif r["data"]["status_code"] >= 400:
    handle_target_error(r["data"]["status_code"])
else:
    use(r["data"]["content"])

Retry transient failures with backoff

Retry 429/500/502/503 with exponential backoff; never retry 400/401/402. For a stubborn 502, escalate to js_rendering and switch proxy country before giving up.

Set a generous client timeout

js_rendering requests can take several seconds. Use an HTTP client timeout of at least 120 seconds so you don't cut off requests that would have succeeded.

Scrape responsibly

Respect each site's Terms of Service and robots.txt.
Only collect data you have a lawful basis to collect; avoid personal data you don't need.
Don't hammer a single domain harder than necessary — pace your crawl.

Keep your key safe

Call OmniScrape from your backend, store the key in an environment variable or secret manager, and rotate it if it ever leaks. See Authentication.

Start cheap, escalate only when needed​

Ask for the format you actually need​

Wait for content, not for time​

Match concurrency to your plan​

Use sessions for related requests​

Geo-target when the data is regional​

Handle both layers of status​

Retry transient failures with backoff​

Set a generous client timeout​

Scrape responsibly​

Keep your key safe​