Visual Regression Testing with Vibium

A complete guide to visual regression testing with Vibium — capture PNG baselines, diff every run, stabilize flaky pixels, and gate CI on UI changes.

By Pramod Dutta·June 23, 2026·14 min read·Verified with Vibium 26.2

▶ Animated overview · made with Remotion

Visual regression testing with Vibium means capturing a screenshot as a known-good baseline, then re-capturing on every run and diffing the two images — if the pixels differ beyond a threshold, the UI changed and the test fails so you can review it. Vibium is AI-native browser automation built on WebDriver BiDi that ships as a single Go binary and auto-downloads Chrome for Testing, so pip install vibium is the entire setup. Its screenshot() returns raw PNG bytes rather than a file path, which lets an image flow straight into a diff library or a hosted service without touching disk. Because Vibium auto-waits for page load, baselines are fully rendered instead of half-painted. Visual regression catches the bugs that assertions miss — a broken grid, a clipped button, a font that failed to load — all of which still pass a text() check yet look wrong to a human. Created by Jason Huggins, co-creator of Selenium and Appium, Vibium gives you this pipeline with almost no framework overhead.

What does the visual regression pipeline look like?

The pipeline has five stages: render the page, capture a baseline once, re-capture on every run, diff against the baseline, and gate your build on the result. Everything after the first baseline is automatic.

The only manual step is approving a baseline — the first time you record it, and again whenever you intentionally change the UI and re-bless the new look. Every run in between is a mechanical compare. That is the whole discipline: freeze what "correct" looks like, then let the machine shout when reality drifts from it.

This guide builds that pipeline from scratch in Python (Vibium's sync API is the most concise for scripts), shows the JavaScript equivalent, and then covers the parts that separate a toy demo from a suite you can trust in CI: stabilizing flaky pixels, masking dynamic content, choosing thresholds, and organizing baselines per browser and viewport.

How do I record a baseline screenshot?

Recording a baseline means opening the page once, capturing it as PNG bytes, and writing those bytes to a reference file that every future run compares against. Run this script a single time to establish the reference.

from vibium import browser_sync as browser
 
vibe = browser.launch(window_size=(1280, 800))
vibe.go("https://example.com")
 
png = vibe.screenshot(full_page=True)
with open("baseline/home.png", "wb") as f:
    f.write(png)
 
vibe.quit()

This opens the page at a pinned 1280×800 window, captures the full scrolling document as PNG bytes, and saves them as baseline/home.png. Pinning the window size matters: layout is a function of viewport width, so an unpinned window produces baselines that drift when you run the suite on a different machine.

Each line does one job:

browser.launch(window_size=(1280, 800)) — starts Chrome over WebDriver BiDi at a fixed size; no driver download needed.
vibe.go(url) — navigates and waits for the load event so the capture is complete, not mid-render.
vibe.screenshot(full_page=True) — captures the entire scrolling page as PNG bytes. See the screenshot command reference for every option.
open(...).write(png) — you decide where the bytes land: a baseline file, an upload, or an in-memory buffer.
vibe.quit() — closes the browser and cleans up.

The bytes-not-files design is the quiet superpower here. Because screenshot() hands you a bytes object, the image can go straight into Pillow, into an HTTP request to a visual-testing API, or into a hash — no temp file, no cleanup.

How do I compare a new run against the baseline?

To compare, capture the current page, load the saved baseline, and diff them pixel by pixel; if the difference exceeds your tolerance, fail the test and write a diff image so a human can see exactly what moved. The free Pillow library does the image math.

from vibium import browser_sync as browser
from PIL import Image, ImageChops
import io
 
vibe = browser.launch(window_size=(1280, 800))
vibe.go("https://example.com")
current = Image.open(io.BytesIO(vibe.screenshot(full_page=True))).convert("RGB")
vibe.quit()
 
baseline = Image.open("baseline/home.png").convert("RGB")
 
if current.size != baseline.size:
    raise AssertionError(
        f"Size changed: baseline {baseline.size} vs current {current.size}"
    )
 
diff = ImageChops.difference(baseline, current)
bbox = diff.getbbox()
if bbox is not None:
    diff.save("diff/home.png")
    raise AssertionError(f"Visual change detected in region {bbox} — see diff/home.png")
print("No visual change.")

ImageChops.difference() returns a per-pixel delta image; getbbox() is None only when the two images match exactly. The explicit size check up front gives a clearer failure than a cryptic diff error when a layout shift changes the page height. For real sites you will almost never want an exact match, though — the next section adds a tolerance.

How do I set a pixel-difference threshold?

A threshold is the percentage of changed pixels you are willing to ignore before failing the test — small enough to catch real regressions, large enough to survive sub-pixel anti-aliasing noise. Demanding a perfect match makes the suite unusably brittle.

Convert the diff to a single number by counting non-black pixels in the delta image and dividing by the total:

def diff_ratio(baseline: Image.Image, current: Image.Image) -> float:
    """Fraction of pixels that differ, from 0.0 (identical) to 1.0 (all changed)."""
    diff = ImageChops.difference(baseline, current).convert("L")
    changed = sum(1 for pixel in diff.getdata() if pixel != 0)
    total = baseline.width * baseline.height
    return changed / total
 
ratio = diff_ratio(baseline, current)
THRESHOLD = 0.002  # allow 0.2% of pixels to differ
if ratio > THRESHOLD:
    diff.save("diff/home.png")
    raise AssertionError(f"Changed {ratio:.4%} of pixels (limit {THRESHOLD:.2%})")

Pick a threshold empirically: run the same page twice with no changes, measure the natural noise floor, and set the limit a little above it. Text-heavy pages with anti-aliased fonts need a touch more tolerance than flat-color UI. A common starting point is 0.1%–0.5%; tighten it once the suite is stable.

The table below maps failure modes to the right knob so you spend tolerance where it helps and stay strict where it counts.

Symptom	Likely cause	Fix
Fails on every run with tiny scattered diffs	Anti-aliasing / font rendering noise	Raise threshold slightly; render at a fixed device scale
Fails only in CI, passes locally	Different OS font stack or GPU	Match CI browser + fonts; run baselines in the same container
Whole image flags as changed	Viewport or scale differs from baseline	Pin `window_size`; assert `size` equality first
Random regions flip each run	Ads, timestamps, carousels, avatars	Mask the region before diffing (see below)
Passes but a real bug slipped through	Threshold too loose	Lower threshold; add a component-level baseline

How do I make visual tests stable?

The number-one cause of flaky visual tests is content that legitimately changes between runs, so stability comes from removing every source of nondeterminism before you capture. Vibium already waits for page load; you handle the rest.

Pin the viewport. Layout is a function of width, so a fixed window_size is non-negotiable for reproducible baselines. Do it at launch:

vibe = browser.launch(window_size=(1280, 800))

Wait for a late element. A full-page screenshot taken while an image or web font is still loading produces a false diff. Force the page to finish by reading a late element — finding it is the wait, because find() auto-waits for actionability:

vibe.go("https://example.com/long-page")
vibe.find("footer").text()   # blocks until the footer is present and rendered
png = vibe.screenshot(full_page=True)

Freeze animations and time. CSS transitions, spinners, and Date.now() clocks all vary frame to frame. Inject a stylesheet that disables animation and neutralize obvious motion before capturing:

vibe.find("body").html()  # ensure DOM is ready first
 
vibe.evaluate("""
  const style = document.createElement('style');
  style.textContent = `*, *::before, *::after {
    animation-duration: 0s !important;
    animation-delay: 0s !important;
    transition-duration: 0s !important;
    transition-delay: 0s !important;
    caret-color: transparent !important;
  }`;
  document.head.appendChild(style);
""")

Mask dynamic regions. For content that changes every load — ads, live timestamps, user avatars — paint over the region before you diff rather than fighting it. Overwrite the element's box with a flat color in both baseline and current captures so the pixels always match:

box = vibe.find(".live-timestamp").bounds()   # {x, y, width, height}
# In your diff step, paint that rectangle a solid color in both images:
from PIL import ImageDraw
for img in (baseline, current):
    d = ImageDraw.Draw(img)
    d.rectangle(
        [box["x"], box["y"], box["x"] + box["width"], box["y"] + box["height"]],
        fill=(0, 0, 0),
    )

Masking is more robust than hiding the element with display:none, because removing an element can reflow the surrounding layout and shift every pixel below it.

How do I capture just one component?

To test a single component, find it first and call screenshot() on the element instead of the page — you get PNG bytes of just that region. Component baselines produce far fewer false positives than full-page diffs because unrelated parts of the page cannot trip them.

card = vibe.find(".pricing-card")
png = card.screenshot()
with open("baseline/pricing-card.png", "wb") as f:
    f.write(png)

This is the highest-leverage habit in visual testing. A full-page baseline for a marketing site breaks the moment anyone edits the footer; a pricing-card.png baseline only breaks when the pricing card actually changes. Build a small library of component baselines for your buttons, cards, nav bars, and modals, and reserve full-page shots for the handful of layouts whose overall composition you truly need to protect.

Vibium's semantic find() helps you target components without brittle CSS. Selecting by accessible role and text survives class-name churn and refactors:

# Screenshot a component located by what users perceive, not by hashed classes.
banner = vibe.find(role="alert")
banner.screenshot()
 
signup = vibe.find(role="button", text="Sign up free")
signup.screenshot()

How do I write this in JavaScript?

The JavaScript flow mirrors Python — launch, navigate, screenshot() for bytes, then diff — using Vibium's sync client and the pixelmatch + pngjs libraries for the image comparison.

const fs = require('fs')
const { browser } = require('vibium/sync')
const { PNG } = require('pngjs')
const pixelmatch = require('pixelmatch')
 
const bro = browser.launch({ windowSize: { width: 1280, height: 800 } })
const page = bro.page()
page.go('https://example.com')
 
// Wait for a late element, then capture the full page as PNG bytes.
page.find('footer').text()
const currentBytes = page.screenshot({ fullPage: true })
bro.close()
 
const current = PNG.sync.read(Buffer.from(currentBytes))
const baseline = PNG.sync.read(fs.readFileSync('baseline/home.png'))
 
if (current.width !== baseline.width || current.height !== baseline.height) {
  throw new Error(`Size changed: ${baseline.width}x${baseline.height} vs ${current.width}x${current.height}`)
}
 
const { width, height } = baseline
const diff = new PNG({ width, height })
const changed = pixelmatch(baseline.data, current.data, diff.data, width, height, { threshold: 0.1 })
 
const ratio = changed / (width * height)
if (ratio > 0.002) {
  fs.writeFileSync('diff/home.png', PNG.sync.write(diff))
  throw new Error(`Changed ${(ratio * 100).toFixed(3)}% of pixels — see diff/home.png`)
}
console.log('No visual change.')

pixelmatch's own threshold option (0–1) controls per-pixel color sensitivity, while the ratio check controls how many pixels may differ overall — the two work together the same way the Pillow version does. For a first baseline, run the script once with the diff logic commented out and write currentBytes to baseline/home.png.

How do I run visual regression in CI?

Running in CI means executing the suite headless, comparing against baselines committed to your repository, and failing the build plus uploading the diff image whenever pixels drift beyond the threshold. Headless keeps runs fast and deterministic:

vibe = browser.launch(headless=True, window_size=(1280, 800))

A reliable CI setup follows a few rules:

Commit baselines to Git. Treat approved PNGs as source of truth. A pull request that changes the UI must also update the baseline in the same commit, which makes the visual change reviewable in the diff.
Render in a fixed container. Fonts and GPU rasterization differ across operating systems, so generate and verify baselines in the same Docker image. A baseline recorded on macOS will flag noise against a Linux CI runner. See running Vibium on a server for a headless container setup.
Upload diffs as artifacts. When a test fails, publish diff/*.png from the job so reviewers see exactly what moved without re-running anything locally.
Fail closed, approve deliberately. A diff over threshold should fail the build. Updating a baseline should be an explicit, reviewed action — never an automatic "just overwrite it" step.

Organize baselines by the axes that affect rendering so a Chrome-desktop shot never gets compared to a mobile one:

baseline/
  chrome-1280x800/
    home.png
    pricing-card.png
  chrome-375x812/
    home.png

Deriving the folder from the browser and viewport at runtime keeps this automatic as you add form factors.

Vibium vs dedicated visual-testing platforms

Vibium gives you the capture-and-diff primitives; hosted platforms like Percy, Applitools, or Chromatic add a cloud baseline store, a review UI, and cross-browser rendering farms. The honest trade-off is control and cost versus convenience.

Capability	Vibium + Pillow/pixelmatch	Hosted visual platform
Cost	Free, open source	Paid, usually per-screenshot or per-seat
Screenshot capture	Built in (PNG bytes)	Built in
Diff engine	You wire it (a few lines)	Managed, tuned out of the box
Baseline storage	Your Git repo	Their cloud, versioned
Review / approval UI	Build your own or read diff PNGs	Polished web dashboard
Cross-browser farm	Bring your own runners	Included
AI "looks right" checks	`page.check()` (built in)	Varies by vendor
Data residency	Fully self-hosted	Vendor-hosted

Choose Vibium's DIY pipeline when you want zero per-image cost, everything in your own repo and CI, and a tiny dependency surface — or when you are testing an internal app where sending screenshots to a third party is off the table. Choose a hosted platform when you need a large cross-browser matrix, a non-engineer approval workflow, and are happy to pay to skip building baseline management yourself. Many teams start with the Vibium pipeline and only graduate to a platform once the review workflow — not the capture — becomes the bottleneck. If you are comparing the underlying engine against other tools, see Vibium vs Playwright and Vibium vs Selenium for the full picture.

Where do AI checks fit alongside pixel diffs?

Pixel diffs and Vibium's AI check() answer different questions, and the strongest suites use both. A pixel diff asks "did anything change from the exact baseline?" while check() asks "does this look right?" — the former is precise but noisy, the latter is tolerant but fuzzy.

# Pixel diff: strict, catches a 3px shift, but flags harmless anti-aliasing too.
# AI check: ignores sub-pixel noise, judges intent — great as a coarse guard.
result = vibe.check("the pricing table shows three plans and no layout is broken")
assert result.passed, result.reason

Use pixel diffing as your deterministic gate on components whose exact appearance is contractual — a logo, a brand color, a checkout button. Use check() for looser, higher-level guards where an exact-match baseline would be too brittle, such as "no content overlaps" or "the hero image loaded." Because check() needs no baseline file, it is also the fastest way to add a smoke-level visual guard to a page you have not yet blessed with a reference image.

Common gotchas

A few sharp edges catch almost everyone the first time they wire up visual regression:

Baselines from the wrong environment. A PNG recorded on your laptop will not match a Linux CI runner because font hinting and rasterization differ. Always record baselines where you verify them.
Forgetting the viewport. An unpinned window changes width between runs and flags the entire page. Set window_size at launch, every time.
Capturing mid-render. Screenshotting before fonts and images settle yields false diffs. Read a late element first so find() waits the page into its final state.
Threshold theatre. A threshold so high that it never fails is worse than no test — it gives false confidence. Set it just above the measured noise floor.
Un-reviewed baseline updates. Auto-overwriting baselines on failure defeats the purpose. Baseline changes must be reviewed like any other diff.

Get these five right and the suite becomes low-noise and trustworthy — which is the whole point, because a flaky visual test gets muted, and a muted test catches nothing.

Next steps

Frequently asked questions

What is visual regression testing with Vibium?

Visual regression testing with Vibium captures a screenshot of a page or component as a known-good baseline, then re-captures on every run and compares the two images pixel by pixel. If they differ beyond a threshold, the UI changed and the test fails so a human can review it.

How does Vibium capture screenshots for visual testing?

Vibium's screenshot() returns raw PNG bytes, not a file path. Call vibe.screenshot(full_page=True) for the whole scrolling document, or element.screenshot() for one component. Because the bytes come back in memory, you can diff or upload them without writing an intermediate file.

How do I stop visual regression tests from being flaky?

Pin the viewport with window_size, wait for a late-loading element before capturing, freeze animations and time, and mask dynamic regions like dates and ads. Vibium already auto-waits for page load, so the remaining flakiness comes from content that legitimately changes between runs.

Can I compare Vibium screenshots without a paid service?

Yes. Use the free Pillow library: open the baseline and the current PNG, call ImageChops.difference(), and check getbbox(). If it returns a bounding box, pixels changed. This gives you a full local diff pipeline with no hosted visual-testing subscription required.

How do I run Vibium visual regression tests in CI?

Run Vibium headless with browser.launch(headless=True), commit approved baselines to the repo, and fail the build when a diff exceeds the threshold. Upload the generated diff image as a CI artifact so reviewers can see exactly what moved before approving or rejecting the change.

Should I use full-page or component screenshots for visual regression?

Prefer component screenshots for most assertions because unrelated page changes cannot trip them, which keeps the suite low-noise. Use full-page screenshots for a small set of critical layouts — a landing page or checkout — where the overall composition itself is what you need to protect.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides

How-To Recipes