Best Practices
A short checklist of habits that keep your scraping reliable, fast, and inexpensive.
Start cheap, escalate only when needed
Use mode: "auto". It tries the cheap, fast path first and only spins up a browser when a page is blocked or needs JavaScript. Pin js_rendering only for sites you know require a browser, and fast only for static content where you want the lowest cost. See Modes.
Ask for the format you actually need
Don't download full HTML and parse it if you only want three fields. Add css_selectors and get back exactly those values. Want clean text for an LLM? Use output_format: "markdown". Less data transferred means faster, cheaper calls.
Wait for content, not for time
On JavaScript-heavy pages, use js_wait_selector instead of guessing a fixed delay. It returns the moment your target element exists, and only waits the full js_wait_timeout when something is wrong.
Match concurrency to your plan
Run a worker pool sized to your plan's concurrency limit. Going over just produces 429s and wasted retries. See Rate limits.
Use sessions for related requests
For pagination, logins, or multi-step flows, reuse one session_id so the same IP and cookies persist. For unrelated URLs, don't share a session — fresh IPs spread load.
Geo-target when the data is regional
Prices, availability, and language change by country. Set proxy: "residential:<cc>" to collect the exact regional view. See Proxies.
Handle both layers of status
A 200 from OmniScrape means we processed your request; the target's own status is in data.status_code. Always check both before trusting the content. See Errors.
r = resp.json()
if not r["success"]:
handle_api_error(r["error"])
elif r["data"]["status_code"] >= 400:
handle_target_error(r["data"]["status_code"])
else:
use(r["data"]["content"])
Retry transient failures with backoff
Retry 429/500/502/503 with exponential backoff; never retry 400/401/402. For a stubborn 502, escalate to js_rendering and switch proxy country before giving up.
Set a generous client timeout
js_rendering requests can take several seconds. Use an HTTP client timeout of at least 120 seconds so you don't cut off requests that would have succeeded.
Scrape responsibly
- Respect each site's Terms of Service and
robots.txt. - Only collect data you have a lawful basis to collect; avoid personal data you don't need.
- Don't hammer a single domain harder than necessary — pace your crawl.
Keep your key safe
Call OmniScrape from your backend, store the key in an environment variable or secret manager, and rotate it if it ever leaks. See Authentication.