How to Parallelize Scraping with Vibium
Parallelize scraping with Vibium in Python — run many headless browsers at once with a thread pool to scrape a URL list far faster than a serial loop.
To parallelize scraping with Vibium, give each worker its own headless browser: launch inside a function, scrape one URL, quit(), and fan that function out across your URL list with a ThreadPoolExecutor. Each Vibium browser is a fully isolated session, so workers never share navigation or element state — the safe, simple model is one browser per task. Because scraping is dominated by network and browser I/O (which releases Python's GIL while waiting), threads give you near-linear speedups without the overhead of processes. A list of 100 URLs that takes ten minutes serially can finish in a fraction of that with eight workers. The only real constraint is memory: each headless Chrome is a real process, so you cap concurrency to what your machine can hold rather than spawning one browser per URL.
What is the parallel scraping script?
from concurrent.futures import ThreadPoolExecutor
from vibium import browser_sync as browser
urls = [
"https://example.com/page/1",
"https://example.com/page/2",
"https://example.com/page/3",
# ... hundreds more
]
def scrape(url):
vibe = browser.launch(headless=True)
try:
vibe.go(url)
return {"url": url, "title": vibe.find("h1").text()}
finally:
vibe.quit()
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(scrape, urls))
for r in results:
print(r)Each call to scrape() gets its own browser, does its work, and tears down — so eight URLs are in flight at any moment. The try/finally guarantees the browser closes even if a page errors, which is what stops leaked Chrome processes from piling up.
How does each step work?
browser.launch(headless=True)— each worker starts its own isolated Chrome. Headless keeps memory and CPU down so you can run more workers.vibe.go(url)— navigates and auto-waits for load.vibe.find("h1").text()— reads the data you want; swap infindAll()for lists of rows or cards.vibe.quit()infinally— closes the browser no matter what, freeing memory for the next task.ThreadPoolExecutor(max_workers=8)— runs up to eightscrape()calls concurrently andpool.mapcollects the results in order.
The golden rule: one browser per worker. Sharing a single browser across threads invites race conditions because both workers would fight over the same page state.
How do I handle failures without losing the whole run?
One bad URL should not crash the batch. Catch errors inside the worker and return them as data, so the pool keeps going and you can inspect what failed afterward:
def scrape(url):
vibe = browser.launch(headless=True)
try:
vibe.go(url)
return {"url": url, "title": vibe.find("h1").text(), "ok": True}
except Exception as e:
return {"url": url, "error": str(e), "ok": False}
finally:
vibe.quit()
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(scrape, urls))
failed = [r for r in results if not r["ok"]]
print(f"{len(results) - len(failed)} ok, {len(failed)} failed")Returning errors instead of raising them means a single timeout never sinks the rest of the batch — and you get a clean list of URLs to retry.
How many workers should I run?
Start with the number of CPU cores and tune from there. Each headless Chrome typically uses a few hundred megabytes of RAM, so memory, not CPU, is usually your ceiling. A practical heuristic:
- 8–16 workers on a developer laptop with 16 GB of RAM.
- More on a dedicated server — divide available memory by ~400 MB per browser and leave headroom.
- Throttle deliberately — add a small delay or a semaphore if you are hitting one domain, so you do not overwhelm the target or get rate-limited.
Watch memory while you scale up; if the machine starts swapping, drop the worker count.
Why one browser per worker beats one shared browser?
A Vibium browser holds a single page's state — current URL, focused element, cookies for the session. If two threads drive the same browser, one worker's go() yanks the page out from under the other. Launching a fresh browser per task keeps each scrape fully independent, which is both simpler to reason about and immune to those races. The cost is memory and a little startup time per task, which the parallelism more than pays back. Because Vibium auto-downloads Chrome once and reuses it, that per-launch cost stays small. See how Vibium works for the architecture that makes spinning up many browsers cheap.
Next steps
Frequently asked questions
How do I parallelize scraping with Vibium?
Give each worker its own browser. Launch a fresh headless Vibium browser inside a function, scrape one URL, then quit() it, and run that function across many URLs with a ThreadPoolExecutor. Each browser is isolated, so workers never share state or trip over each other.
Can one Vibium browser scrape multiple pages at once?
A single browser instance should be driven by one worker at a time. For true parallelism, launch one browser per worker rather than sharing a single instance across threads, which keeps each session's navigation and element state independent and avoids race conditions.
Should I use threads or processes for parallel Vibium scraping?
Threads with a ThreadPoolExecutor work well because the slow part is network and browser I/O, which releases the GIL while waiting. Use processes only if your post-processing is CPU-heavy. Cap the worker count to what your machine's memory can handle, since each browser uses real RAM.
Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.
Related guides
How to Automate a Checkout Flow with Vibium
Automate an e-commerce checkout with Vibium in Python — add to cart, fill shipping and payment fields, place the order, and verify the confirmation page.
4 min read→How-To RecipesHow to Automate a Google Search with Vibium
Automate a Google search with Vibium in Python — open Google, type a query, submit it, and read the result titles in about ten lines of code.
3 min read→How-To RecipesHow to Automate a Multi-Tab Flow with Vibium
Automate a multi-tab flow with Vibium in Python — open new tabs with new_page(), switch with bring_to_front(), capture popups, and close tabs cleanly.
3 min read→How-To RecipesHow to Automate a Search Box with Vibium
Automate a search box with Vibium in Python — find the input, type your query, press Enter, wait for results to render, and read them back with findAll.
3 min read→