VLearnVibium

Web Scraping with Vibium: Complete Guide

Web scraping with Vibium: launch a real Chrome, extract data with find and findAll, handle pagination and JS-heavy pages, and export to CSV or JSON.

By Pramod Dutta··14 min read·Verified with Vibium 26.2
▶ Animated overview · made with Remotion

To scrape a website with Vibium, launch a real Chrome with browser.launch(), navigate with vibe.go(url), extract data with find() (one element) or findAll() (a list), and write the results to CSV or JSON. Vibium is AI-native browser automation built on WebDriver BiDi and shipped as a single Go binary that auto-downloads Chrome, so it drives a genuine browser rather than fetching raw HTML. That is the key advantage for scraping: JavaScript-heavy pages — React, Vue, Angular, infinite feeds — render fully before you read them, and Vibium's auto-waiting find() means you almost never write manual sleeps. The typical pipeline is four steps: launch the browser, navigate to the target, locate and read the data, then export it. This guide walks that pipeline end to end in both Python and JavaScript, then covers pagination, dynamic content, resilient selectors, and the etiquette that keeps your scraper from getting blocked.

What does the Vibium scraping pipeline look like?

Every Vibium scraper follows the same four-stage shape, whatever the site. You launch a browser, point it at a page, pull the fields you want out of the DOM, and persist them.

The diagram maps one-to-one to the code you write: browser.launch() starts Chrome, vibe.go(url) loads the target, find()/findAll() extract the data, and a file write exports it. The rest of this guide expands each stage and shows how to handle the messy real-world cases — pagination, lazy loading, and brittle selectors — without leaving that structure.

What is the minimal Vibium scraping script?

Here is a complete, runnable scraper that pulls every quote and author from a paginated demo site. It is the smallest thing that exercises all four pipeline stages.

from vibium import browser_sync as browser
import csv
 
vibe = browser.launch()
vibe.go("https://quotes.toscrape.com")
 
rows = []
for card in vibe.findAll(".quote"):
    text = card.find(".text").text()
    author = card.find(".author").text()
    rows.append({"quote": text, "author": author})
 
with open("quotes.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["quote", "author"])
    writer.writeheader()
    writer.writerows(rows)
 
print(f"Scraped {len(rows)} quotes")
vibe.quit()

findAll(".quote") returns every matching element as a list. For each card, a scoped card.find(".text") searches only inside that card, so you never cross-wire one quote's text with another's author. The card.find(...).text() calls read the visible text, and the whole batch lands in quotes.csv.

The same script in JavaScript uses the sync client from the vibium/sync subpath:

const { browser } = require('vibium/sync')
const fs = require('fs')
 
const bro = browser.launch()
const page = bro.page()
page.go('https://quotes.toscrape.com')
 
const rows = []
for (const card of page.findAll('.quote')) {
  const quote = card.find('.text').text()
  const author = card.find('.author').text()
  rows.push({ quote, author })
}
 
const header = 'quote,author\n'
const body = rows
  .map((r) => `"${r.quote.replace(/"/g, '""')}","${r.author}"`)
  .join('\n')
fs.writeFileSync('quotes.csv', header + body)
 
console.log(`Scraped ${rows.length} quotes`)
bro.close()

Note the small API differences: in JavaScript you get a page with bro.page() and close with bro.close(); in Python the launched object is your page and you call vibe.quit(). Everything in between — go, find, findAll, text — is identical. If you are new to the toolchain, start with what is Vibium and install Vibium before running these.

How do I find and extract the data I want?

Vibium gives you one finder with two signatures: pass a CSS string for the common case, or an options object for semantic strategies. find() returns a single element and auto-waits; findAll() returns a list immediately.

# Single element by CSS
title = vibe.find("h1.product-title").text()
 
# A list of elements
prices = [el.text() for el in vibe.findAll(".price")]
 
# Read an attribute (href, src, data-*)
first_link = vibe.find("a.result").attr("href")
 
# Semantic find — role, text, label (no brittle CSS)
buy_button = vibe.find(role="button", text="Add to cart")

The three data accessors you will use constantly are text() for visible content, attr(name) for attributes like href or data-id, and value() for form-input values. Because find() waits for the element to be present and actionable, you read the hydrated DOM even on slow pages. See the full find element reference for every selector strategy.

Here is how the accessors compare, so you pick the right one:

AccessorReturnsUse it for
el.text()Visible text contentHeadings, prices, labels, body copy
el.attr("href")An attribute valueLinks, image src, data-* ids
el.value()An input's current valuePre-filled form fields
el.html()Inner HTMLWhen you need nested markup, not just text
el.findAll("css")Scoped list of childrenRows inside a specific table or card

How do I scrape data that loads with JavaScript?

Vibium runs a real Chrome, so client-rendered content executes before you read it — but you still need to wait for the right moment. For content that appears after an XHR or a user action, poll a condition with wait_for_function() instead of guessing a delay.

from vibium import browser_sync as browser
 
vibe = browser.launch()
vibe.go("https://example.com/dashboard")
 
# Wait until the async list has actually rendered rows.
vibe.wait_for_function("document.querySelectorAll('.row').length > 0")
 
rows = [el.text() for el in vibe.findAll(".row")]
print(f"Loaded {len(rows)} rows")
vibe.quit()

wait_for_function() re-evaluates the JavaScript condition until it is true or the timeout fires, so the scraper advances the instant the data is ready and never reads an empty shell. This is the single most useful pattern for single-page apps — it is why Vibium handles React apps and other SPAs cleanly where a raw-HTML fetcher returns nothing.

For pages that reveal content only as you scroll, drive the scroll with evaluate() and re-count between rounds:

seen = 0
for _ in range(50):  # safety cap so an endless feed can't hang
    count = len(vibe.findAll(".feed-item"))
    if count == seen:
        break  # nothing new loaded — end of feed
    seen = count
    vibe.evaluate("window.scrollTo(0, document.body.scrollHeight)")
    vibe.wait(800)  # let the next batch fetch and render

The full walkthrough of that technique lives in scrape an infinite-scroll page. The core idea — count, scroll, wait, re-count, stop when the count stops growing — generalizes to any lazy-loaded list.

How do I scrape multiple pages of results?

Scrape the current page, click the next-page control, wait for the new results, and repeat until the control disappears. Collect rows into one list as you go so an early failure never loses what you already have.

from vibium import browser_sync as browser
 
vibe = browser.launch()
vibe.go("https://quotes.toscrape.com")
 
all_rows = []
for _ in range(20):  # page cap as a safety net
    for card in vibe.findAll(".quote"):
        all_rows.append({
            "quote": card.find(".text").text(),
            "author": card.find(".author").text(),
        })
 
    next_links = vibe.findAll("li.next > a")
    if not next_links:
        break  # no "Next" link — we're on the last page
    next_links[0].click()
    vibe.find(".quote").wait_until("visible")  # wait for page N+1
 
print(f"Scraped {len(all_rows)} quotes across all pages")
vibe.quit()

Two details make this robust. First, findAll("li.next > a") returns an empty list on the final page, giving a clean stop condition rather than an exception. Second, after clicking, find(".quote").wait_until("visible") blocks until the next page's content renders, so you never read the previous page twice. For URL-driven pagination (?page=2), you can instead loop vibe.go() over the page numbers — see paginate results for both styles.

How do I export scraped data to CSV or JSON?

Once your rows are a list of dictionaries, exporting is a few lines. CSV suits spreadsheets and flat data; JSON preserves nested structure.

import csv, json
 
# CSV
with open("data.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["quote", "author"])
    writer.writeheader()
    writer.writerows(all_rows)
 
# JSON
with open("data.json", "w") as f:
    json.dump(all_rows, f, indent=2, ensure_ascii=False)

Keeping extraction and export separate — build the list first, write it last — means you can swap output formats or add a database sink without touching the scraping logic. It also makes the run resumable: persist partial results inside the loop if a scrape takes hours.

How do I scrape an HTML table into rows?

Scrape a table by iterating its rows with findAll("tbody tr"), then reading each cell inside a row with a scoped findAll("td"). Tables are the most common structured-data source on the web, and the scoped-find approach keeps every cell aligned to its row.

from vibium import browser_sync as browser
 
vibe = browser.launch()
vibe.go("https://example.com/report")
 
# Column names come from the header cells.
headers = [th.text() for th in vibe.findAll("table thead th")]
 
rows = []
for tr in vibe.findAll("table tbody tr"):
    cells = [td.text() for td in tr.findAll("td")]
    rows.append(dict(zip(headers, cells)))
 
print(f"Parsed {len(rows)} rows with columns: {headers}")
vibe.quit()

The dict(zip(headers, cells)) line pairs each cell with its column name, so you get labelled records instead of anonymous positional lists — much easier to export and reason about. Because the inner tr.findAll("td") is scoped to a single row, cells never leak between rows even if the table has hundreds of them. A dedicated deep-dive lives in scrape a table with Vibium.

What does a real-world scrape look like, with error handling?

A production scrape wraps each record in a try/except so one malformed item never kills the whole run, and it saves progress as it goes. Here is an e-commerce product list scraped defensively.

from vibium import browser_sync as browser
import json
 
vibe = browser.launch()
vibe.go("https://example-shop.com/category/laptops")
 
# Dismiss a consent overlay if it's blocking the grid.
banners = vibe.findAll("button#accept-cookies")
if banners:
    banners[0].click()
 
# Wait for the product grid to render before reading it.
vibe.wait_for_function("document.querySelectorAll('.product-card').length > 0")
 
products = []
for card in vibe.findAll(".product-card"):
    try:
        products.append({
            "name": card.find(".product-name").text(),
            "price": card.find(".price").text(),
            "url": card.find("a").attr("href"),
            "in_stock": len(card.findAll(".sold-out")) == 0,
        })
    except Exception as err:
        print(f"Skipping a card: {err}")  # keep going on a broken item
 
with open("laptops.json", "w") as f:
    json.dump(products, f, indent=2, ensure_ascii=False)
 
print(f"Scraped {len(products)} products")
vibe.quit()

Three production habits are on display. The consent banner is dismissed first, since overlays routinely hide the grid. wait_for_function() guarantees the client-rendered cards exist before the loop reads them. And the per-card try/except means a single product with missing markup logs a warning and is skipped, rather than crashing a run that may have taken minutes to reach that point. Deriving in_stock from whether a .sold-out badge exists is a handy trick — you often infer boolean fields from the presence or absence of an element rather than its text.

When should I use Vibium instead of requests plus BeautifulSoup?

Choose based on how the page renders its content. If the data is in the initial HTML, a lightweight HTTP fetch is faster; if the data is drawn by JavaScript or hidden behind interaction, you need a browser.

SituationBest toolWhy
Static server-rendered HTMLrequests + BeautifulSoupNo browser overhead; fastest for plain HTML
Client-rendered SPA (React/Vue)VibiumRuns JS, so the data actually exists in the DOM
Infinite scroll / lazy loadingVibiumCan scroll and wait for new batches
Content behind login or clicksVibiumCan type, click, and hold a session
Thousands of simple API-like pagesrequestsLower memory, trivially parallel
Anti-bot pages needing real fingerprintVibiumA real Chrome passes checks a bare client fails

The honest verdict: reach for requests when the HTML already contains your data — it is lighter and you can run far more of it in parallel. Reach for Vibium the moment JavaScript, interaction, or a realistic browser fingerprint enters the picture. Many production pipelines use both: a fast HTTP path for static pages and Vibium for the JS-heavy ones. For scraping that requires signing in first, follow scrape behind a login.

How do I make a Vibium scraper reliable?

Reliability comes down to stable selectors, honest waits, and defensive loops. Sites change their markup, networks stall, and lists get virtualized — a good scraper expects all three.

  • Prefer stable selectors. Target data-* attributes, ids, or ARIA roles over deep CSS chains like div > div:nth-child(3) > span. Vibium's semantic find(role=..., text=...) survives redesigns that break positional CSS.
  • Let auto-waiting do its job. find() and click() already wait for actionability. Add wait_for_function() only for content that loads after an event; avoid fixed wait() sleeps except as a last resort.
  • Cap every loop. Pagination and scroll loops need a page or round cap so a broken stop condition can't hang the run forever.
  • Dedupe by a stable id. Virtualized lists re-render the same items as you scroll; skip duplicates with a seen set keyed on el.attr("data-id").
  • Screenshot on failure. When a selector returns nothing, capture the page with screenshot() so you can see whether markup changed or a consent banner blocked the content.
  • Handle interstitials first. Dismiss cookie and consent overlays before scraping — see handle a cookie banner — since they often hide the very elements you want.

Structuring selectors and flows as reusable objects pays off once you have more than one scraper; the page object model pattern keeps locators in one place so a site change is a one-line fix.

How do I run a Vibium scraper headless and at scale?

Run without a visible window by launching headless, which is what you want on a server or in CI where there is no display. Everything else about the script stays the same.

from vibium import browser_sync as browser
 
vibe = browser.launch(headless=True)
vibe.go("https://example.com")
# ... same find / findAll / export logic ...
vibe.quit()
const { browser } = require('vibium/sync')
 
const bro = browser.launch({ headless: true })
const page = bro.page()
page.go('https://example.com')
// ... same extraction logic ...
bro.close()

Because Vibium is a single Go binary that auto-downloads Chrome, deploying to a server is mostly just installing the package — there is no separate driver to version-match. For scraping many URLs, the fastest safe approach is a small pool of isolated contexts rather than one giant loop; each context has its own cookies and storage, so sessions never bleed together.

# Reuse one browser, isolate each job in its own context.
urls = ["https://example.com/a", "https://example.com/b", "https://example.com/c"]
 
vibe = browser.launch(headless=True)
results = {}
for url in urls:
    ctx = vibe.new_context()   # fresh cookies/storage per job
    page = ctx.new_page()
    page.go(url)
    results[url] = page.find("h1").text()
    ctx.close()                # free the tab before the next job
 
vibe.quit()

Reusing one browser process across jobs is far lighter than launching a fresh Chrome per URL, while new_context() keeps each job's state clean. For running several pages truly concurrently, use the async API and gather tasks — the parallel scraping guide covers that pattern, and run on a server covers the deployment specifics like sandbox flags and headless dependencies.

Scraping publicly visible data is generally permissible, but "generally" is doing real work in that sentence — the specifics depend on the site's terms of service, its robots.txt, and the data-protection and copyright laws where you operate. Treat the following as ground rules, not legal advice.

  • Read the terms of service and robots.txt before scraping, and honor any crawl-delay or disallowed paths.
  • Rate-limit yourself. Add delays and concurrency limits so you never degrade the site for real users; a scraper that hammers a server can cause real harm and get your IP banned.
  • Avoid personal and gated data you have no right to collect. Public product listings are very different from private profiles or login-only content.
  • Cache and reuse rather than re-scraping the same pages repeatedly.
  • Identify honestly where a site expects it, and prefer an official API when one exists.

When in doubt, ask permission or use a published API. Ethical scraping is also more durable scraping — a considerate crawler is far less likely to be blocked.

Next steps

Frequently asked questions

How do I scrape a website with Vibium?

Launch a browser with browser.launch(), navigate with vibe.go(url), then read data with vibe.find() for a single element or vibe.findAll() for a list. Loop over the results, pull text and attributes, and write them to CSV or JSON. Vibium auto-waits for elements, so no manual sleeps are needed.

Is Vibium good for scraping JavaScript-heavy sites?

Yes. Vibium drives a real Chrome over WebDriver BiDi, so client-rendered React, Vue, and Angular pages execute their JavaScript before you read the DOM. Because find() auto-waits for elements to appear and become actionable, you scrape the fully hydrated page instead of an empty HTML shell.

Do I need to write my own waits when scraping with Vibium?

Rarely. Vibium's find() and click() wait for elements to be visible and actionable automatically. For content that loads after an XHR or on scroll, use vibe.wait_for_function() to poll a condition, which is more reliable than a fixed sleep and adapts to slow networks.

How do I scrape multiple pages of results with Vibium?

Scrape the current page with findAll(), click the next-page link, then wait for the new results to render before reading again. Loop until the next button disappears or a page cap is hit. Collecting rows into a list as you go keeps the data safe if the run stops early.

Is web scraping with Vibium legal?

Scraping public data is generally permitted, but it depends on the site's terms of service, its robots.txt, and local laws on data and copyright. Always check the terms, respect rate limits, avoid personal or login-gated data you have no right to, and never overload a server.

How is Vibium different from BeautifulSoup or requests for scraping?

requests plus BeautifulSoup fetch raw HTML and cannot run JavaScript, so they miss content rendered on the client. Vibium runs a full browser, so it sees the same page a user does — ideal for single-page apps, lazy-loaded lists, and pages behind interactive UI. For static HTML, requests is lighter and faster.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides