Parallel Vibium Testing at Scale

A field guide to parallel Vibium testing at scale: shard across CI machines, size workers by memory, pool browsers vs contexts, and keep flake near zero.

By Pramod Dutta·June 23, 2026·15 min read·Verified with Vibium 26.2

▶ Animated overview · made with Remotion

To run parallel Vibium testing at scale, combine two multipliers: many worker processes per machine, then shard the whole suite across several machines — and give every worker its own isolated Vibium browser or context so nothing is ever shared. Vibium is AI-native browser automation built on WebDriver BiDi and shipped as a single Go binary by Selenium and Appium co-creator Jason Huggins, which is exactly why it scales cleanly: there is no driver to match, no Grid to babysit, and each machine auto-downloads its own Chrome. The scaling recipe is mechanical. Within a machine, run one browser or context per worker and size the worker count by memory, not CPU. Across machines, split the test files into shards and run them on a CI matrix. Then merge reports and artifacts at the end. Because browser tests are dominated by network and rendering I/O, this yields near-linear speedups, turning a 40-minute serial suite into a two-minute pipeline once you spread it across enough isolated workers.

The scale pipeline

This article assumes you already know the basics of running tests concurrently. If you do not, start with how to parallelize Vibium tests first, then come back here for the at-scale techniques: cross-machine sharding, worker-sizing math, browser pooling, and keeping flake near zero when hundreds of tests run at once.

What does "at scale" actually change?

At scale, the bottleneck moves from your code to your hardware and orchestration. A ten-test suite runs fine with any setup; the interesting problems only appear at hundreds or thousands of tests across a shared CI fleet. Three things change and each needs a deliberate answer.

First, one machine is no longer enough — you have to shard the suite across many runners. Second, memory becomes the hard ceiling, so worker sizing stops being "use all the cores" and becomes an arithmetic problem. Third, flake compounds: a test that fails 1-in-500 runs is invisible in a small suite but fails almost every pipeline once you run 800 tests. Solving all three is what this guide is about.

The good news is that Vibium's design removes an entire category of scaling pain. There is no shared driver process, no chromedriver version to align, and no Grid hub that becomes a single point of failure. Every worker is a self-contained browser session, so scaling is mostly about spawning more of them safely.

What are the two axes of parallelism?

Parallel Vibium scaling has two independent multipliers, and you use both at once. Understanding them separately is the key to reasoning about total throughput.

The intra-machine axis is worker processes on one box: pytest -n 8 or a Node/Jest worker pool. The inter-machine axis is sharding: splitting the suite across N CI runners that each execute a slice. Your total concurrency is roughly machines × workers-per-machine, so 4 machines running 8 workers each gives you 32 browsers in flight.

Axis	Mechanism	Scales with	Limited by
Intra-machine (workers)	`pytest -xdist`, Jest workers	CPU cores + RAM on one box	Memory per browser
Inter-machine (shards)	CI matrix / `--shard`	Number of runners you can pay for	CI concurrency + cost
AI verification concurrency	`page.check()` calls	LLM rate limits	API quota, not the browser

The practical implication: do not try to cram 32 workers onto one machine to hit a throughput target. Two 16-worker machines beat one 32-worker machine because you avoid memory pressure on any single box. Spread wide before you stack deep.

How many workers should one machine run?

Size the worker count by usable memory divided by per-browser memory, not by CPU core count. This is the single most important number to get right, because an oversubscribed machine is slower and flakier than a correctly sized one — a box that pages to disk crawls.

Each headless Chrome that Vibium launches is a real OS process using roughly 300-500 MB depending on the page. The formula is simple:

workers ≈ (usable_RAM_MB × 0.8) / 400

The 0.8 leaves headroom for the OS, the test runner, and memory spikes on heavy pages. Here is that math applied to common CI runners:

Machine RAM	Safe workers (~400 MB each)	Notes
8 GB	3-4	Small runners; prefer more machines
16 GB	6-8	Typical hosted CI runner
32 GB	12-16	Beefy runner or a dev workstation
64 GB	24-32	Self-hosted; watch I/O and network too

On memory-constrained CI, an explicit -n 6 often beats -n auto, because auto targets CPU count and can over-subscribe RAM on a high-core, low-memory runner. Start at the table value, watch peak memory for one run, then tune. If the machine swaps, drop the count — no exceptions.

Should each test get a new browser or a new context?

Prefer a fresh context per test with one browser per worker; reach for a brand-new browser per test only when you need full process isolation. A Vibium context is an isolated cookie jar and storage sandbox created with bro.new_context(), so tests stay independent without paying a full browser launch every time.

The trade-off is startup cost versus isolation strength. Launching a browser per test is the simplest and most bulletproof isolation, but it is also the slowest — a real ceremony you pay on every single test. Reusing one browser and creating a context per test skips that launch while keeping cookies, localStorage, and session state fully separate.

Strategy	Isolation	Startup cost	Use when
Browser per test	Full OS process isolation	Highest	A test crashes Chrome, or you need distinct browser flags
Context per test (recommended)	Separate cookies/storage	Low	The default for large suites
Shared context	None	Zero	Never for parallel tests — races on state

Here is the fast pattern in Python — one browser per worker, a clean context handed to each test:

# conftest.py
import os
import pytest
from vibium import browser_sync as browser
 
 
@pytest.fixture(scope="session")
def shared_browser():
    instance = browser.launch(headless=os.getenv("HEADLESS", "true") == "true")
    yield instance
    instance.quit()
 
 
@pytest.fixture
def vibe(shared_browser):
    ctx = shared_browser.new_context()   # isolated cookies + storage
    page = ctx.new_page()
    yield page
    ctx.close()

With pytest-xdist, every worker process gets its own shared_browser (fixtures are per-worker), and inside a worker each test gets a clean context. You get parallelism across workers and isolation within them, at far lower per-test overhead than launching a browser every time. The same idea in JavaScript uses the sync client:

// support/browser.js
const { browser } = require('vibium/sync')
 
// One browser per worker process (Jest/Mocha spawns one per worker).
const bro = browser.launch({ headless: true })
 
function freshPage() {
  const ctx = bro.newContext()   // isolated cookies + storage
  return { page: ctx.newPage(), ctx }
}
 
module.exports = { bro, freshPage }

Each test calls freshPage(), drives page, and closes ctx when done. For the full fixture design behind this, see how to structure a Vibium test suite.

How do you shard a suite across multiple machines?

Split the test files into N groups — one per machine — with a CI matrix, run each slice in parallel internally, then merge results at the end. Sharding is the inter-machine multiplier, and it is what takes you from "fast on my laptop" to "fast for the whole team."

Both major runners support sharding natively. pytest-xdist can distribute across a group of files, and Jest exposes --shard=INDEX/TOTAL. Point each matrix leg at its own slice:

# Machine 1 of 4 (Jest)
npx jest --shard=1/4
 
# Machine 2 of 4
npx jest --shard=2/4

# pytest — split by test path per matrix leg, then xdist within each
pytest tests/shard_1 -n auto
pytest tests/shard_2 -n auto

A GitHub Actions matrix wires this up cleanly. Each leg is a full machine that runs its shard with intra-machine workers, so you multiply both axes at once:

# .github/workflows/e2e.yml
jobs:
  e2e:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false            # let all shards finish; see the full failure picture
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install vibium pytest pytest-xdist
      - run: vibium install       # cache Chrome for Testing across runs
      - run: pytest --splits 4 --group ${{ matrix.shard }} -n auto
        env:
          HEADLESS: "true"

Four machines, each running -n auto workers, gives you 4 × cores browsers in flight. Set fail-fast: false so one shard failing does not cancel the others — at scale you want the complete failure picture from every leg, not a partial one. For the full CI setup including caching and artifacts, see running Vibium in CI/CD with GitHub Actions.

How do you split shards evenly so no machine straggles?

Balance shards by historical runtime, not by file count, so every machine finishes at roughly the same time. The slowest shard sets your pipeline's wall-clock time, so one lopsided leg wastes all the parallelism you paid for.

Naive alphabetical splitting is the common trap: if test_checkout.py happens to hold your ten slowest end-to-end journeys, that shard runs long while the others sit idle. The fix is duration-based splitting. Tools like pytest-split record each test's runtime and pack shards to equal time, and several Jest sharding helpers do the same.

Record durations once on a green run, then feed them back so future runs split by measured time.
Keep shard count stable — durations only balance well if TOTAL does not change every commit.
Watch the straggler: if one shard is consistently 30% slower, your durations are stale or a test regressed. Re-record.

A well-balanced 4-way shard should see all legs finish within ~10% of each other. If they do not, you are leaving speed on the table.

What breaks at scale, and how do you keep flake near zero?

Flake at scale comes almost entirely from two sources: shared state between concurrent tests, and an oversubscribed machine. Fix both and a 1000-test parallel suite is as reliable as a 10-test one. Vibium removes the third classic cause — timing flake — because its find() auto-waits for actionability, so you never sprinkle sleep() to paper over races.

Shared state is the number-one killer. Two workers driving the same browser page means one test's go() yanks the page out from under the other. The rule is absolute: one browser or one context per test, never a shared page. If a test reads or writes a shared backend record (the same user account, the same cart), give each worker its own seed data or a fresh account.

Oversubscription is the number-two killer. A machine paging to disk makes every browser sluggish, which surfaces as random timeouts that look like product bugs but are really resource starvation. The fix is the sizing math above: back off worker count until peak memory sits under ~80%.

Symptom at scale	Likely cause	Fix
Passes solo, fails in parallel	Shared page/context/data	One context per test; per-worker seed data
Random timeouts under load	Memory oversubscription / swapping	Lower worker count; add machines
One shard always slow	Unbalanced split	Split by recorded duration
Chrome processes leak after a run	Missing teardown on failure	`try/finally` → `ctx.close()` / `quit()`
Passes on rerun, not first try	Timing race in the app	Auto-waiting `find()`; avoid `page.wait(ms)`

For a deeper checklist, see writing flake-free Vibium tests and Vibium waiting strategies. The teardown point is worth emphasizing: always wrap cleanup so a failing test still closes its context, or workers slowly leak Chrome processes until the machine runs out of memory mid-run.

How do you keep AI verification affordable in parallel?

Vibium's AI-native page.check() runs a screenshot through a multimodal model, so at scale its bottleneck is your LLM rate limit and cost — not the browser. Treat AI checks as a separate concurrency budget from your browser workers.

// One high-value AI assertion instead of ten brittle DOM checks
const { passed, reason } = page.check('the dashboard shows 3 widgets and no error banner')
expect(passed).toBe(true)   // reason gives you a human-readable failure message

Because each check() is an API call, 32 browser workers all calling check() at once can hit provider rate limits long before they exhaust memory. Two tactics keep it sane. Reserve page.check() for the assertions where plain-English intent genuinely beats a selector — a visual layout, a "no errors visible" sweep — and use deterministic el.text() or el.isVisible() for the mechanical checks. That keeps the token bill and the request volume proportional to the value you get, which matters most when hundreds of tests fan out at once.

How do you scale beyond hosted CI limits?

When your shard count outgrows your CI plan's concurrency, move to self-hosted or autoscaling runners so machines spin up on demand and disappear when the run ends. This is the step teams reach once a full matrix would otherwise queue behind the plan's parallel-job cap.

Because a Vibium runner is self-contained — the binary carries its own Chrome — a runner image is trivial to bake: install Python or Node, pip install vibium (or npm install vibium), run vibium install once so Chrome for Testing is pre-cached in the image, and you are done. There is no Grid to register against and no driver to align, so an autoscaler can add identical, stateless runners freely.

Bake Chrome into the image with vibium install so a cold runner does not re-download the browser on every job.
Right-size the instance type to the worker math — a machine with more RAM lets each runner carry more workers, often cheaper than twice as many small runners.
Scale to zero between runs; stateless runners mean you only pay while the pipeline is active.

Cost scales with machines × minutes, so the duration-balanced sharding from earlier pays off twice: it shortens wall-clock time and trims the compute bill, because no machine sits idle waiting on a straggler shard.

Does Vibium replace a Selenium Grid at scale?

For most teams, yes — you scale by adding machines to a CI matrix instead of running a Grid hub and nodes. Because Vibium is a single Go binary that auto-downloads its own Chrome, every runner is self-contained, which eliminates the hub-and-node topology that a Selenium Grid requires.

The Grid model centralizes sessions through a hub that routes to nodes; it is powerful but is also infrastructure you own, patch, and debug — and the hub is a single point of failure under heavy load. The matrix-shard model has no central broker: each machine is independent, so there is nothing to become a bottleneck.

Concern	Selenium Grid	Vibium matrix sharding
Topology	Hub + nodes to maintain	Independent runners, no hub
Driver management	Match driver to browser per node	None — binary carries Chrome
Single point of failure	The hub	None
Cross-browser fleet	Strong (Firefox, Safari, etc.)	Chrome-focused today
Best for	Large mixed-browser labs	Fast Chrome-based CI at scale

Choose Vibium's matrix approach when your target is Chrome and you want the simplest path to fast, wide parallelism. A managed or self-hosted Grid still earns its keep for large cross-browser fleets or when you must centralize sessions for policy reasons. For the deterministic-vs-AI comparison against the other big engine, see Vibium vs Playwright.

What does a scaled run look like end to end?

Putting it together: shard the suite by duration across a matrix, run memory-sized workers on each machine, hand every test its own context, retry only genuine flakes, and merge the reports. That is the whole recipe, and each piece has a home elsewhere on this site.

Files → shards by recorded runtime, one shard per matrix leg.
Shard → workers with -n auto (or an explicit count) sized to RAM.
Worker → browser, one per worker, reused across that worker's tests.
Test → context, fresh new_context() per test for isolation.
Failure → artifact: screenshot on failure and upload it per shard.
Shards → merge: combine JUnit/HTML reports so you see one result.

A team that adopts this typically watches a 40-minute serial suite drop to a couple of minutes of wall-clock time, because the work is spread across dozens of isolated browsers doing I/O in parallel. The ceiling is your CI budget and your app's own rate limits, not Vibium — the binary is happy to launch as many browsers as the hardware can hold.

Next steps

How to parallelize Vibium tests — the intra-machine foundations to master first.
Running Vibium in CI/CD with GitHub Actions — caching, matrices, and artifacts.
How to structure a Vibium test suite — fixtures and context pooling.
Writing flake-free Vibium tests — kill flake before it compounds.
Vibium vs Selenium — matrix sharding compared with a Grid.
What is Vibium and install Vibium — the fundamentals.

Frequently asked questions

How do you run Vibium tests in parallel at scale?

Combine two axes: run many worker processes on each machine (pytest -n auto or a Jest/Node worker pool), then shard the whole suite across several CI machines. Give every worker its own Vibium browser or context so nothing is shared, and size the worker count by available RAM, not CPU.

How many parallel Vibium workers can one machine handle?

Memory is the ceiling, not CPU. Each headless Chrome uses roughly 300-500 MB, so divide usable RAM by about 400 MB and leave 15-20% headroom. A 16 GB runner comfortably runs 6-8 workers; a 64 GB machine can push 24-32 before it starts swapping.

Should each parallel Vibium test get a new browser or a new context?

Use a fresh context per test for speed and one browser per worker. A Vibium context is an isolated cookie and storage sandbox, so tests stay independent while skipping a full browser launch each time. Launch a brand-new browser per test only when you need full process isolation.

How do you shard a Vibium suite across multiple CI machines?

Split the test files into N groups, one per machine, using a matrix job. pytest supports pytest-xdist with distribution modes, and Jest exposes --shard=INDEX/TOTAL. Each machine runs its slice in parallel internally, then you merge the reports and artifacts at the end.

Why do parallel Vibium tests become flaky at scale, and how do you fix it?

Flake at scale almost always comes from shared state or an oversubscribed machine. Give every worker its own browser or context, never a shared page, and drop the worker count if the box is swapping. Vibium's built-in auto-waiting removes the timing flake that sleeps cause.

Can Vibium replace a Selenium Grid for large parallel runs?

Often yes. Because Vibium is a single Go binary that auto-downloads Chrome, each CI machine is self-contained, so you scale by adding machines to a matrix instead of maintaining a hub-and-node Grid. For very large or cross-browser fleets, a managed grid still has a role.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides

Best Practices