Data-Driven Testing with Vibium
Data-driven testing with Vibium: feed one browser test many rows from arrays, CSV, or JSON, loop over cases, and keep the automation logic in one place.
Data-driven testing with Vibium means writing one browser flow — navigate, find, type, click, assert — and running it against many rows of input and expected output, so new coverage is a new line of data rather than a new copy of the test. Because Vibium is a plain library on top of WebDriver BiDi (created by Selenium and Appium co-creator Jason Huggins), it imposes no data format: you keep your cases in a JavaScript array, a JSON file, or a CSV, load them with your language's normal tools, and loop. Each row gets its own fresh page or isolated context, so logins and cookies never bleed between cases. Vibium's find() auto-waits for every element to become actionable, so your parameterized flow stays free of sleep() and reads the same for one row or five hundred. Pair the loop with your existing runner — Jest, Mocha, or pytest's @parametrize — and one maintainable function verifies an entire spreadsheet of scenarios. This guide shows the pattern in JavaScript first, then Python, with runnable, verified code.
What is data-driven testing, and why use it with Vibium?
Data-driven testing separates the steps of a test from the data those steps run against. You write the browser journey once — say, "log in, then check the greeting" — and supply a table of inputs and expected results. The runner executes the journey for every row, reporting each as its own pass or fail.
The payoff is leverage. A login form has valid users, locked accounts, wrong passwords, empty fields, and SQL-injection probes. Written the naive way that is five near-identical copies of one test; written data-driven it is one function and a five-row table. Adding the sixth case is one more row.
Vibium fits this model cleanly because it is only the browser layer. It drives Chrome, finds elements, and asserts with check(), but it ships no runner, no data loader, and no opinion about where your rows live. That means you use the data tools you already know and treat Vibium like any other library your loop calls. New to the tool itself? Start with what is Vibium and install Vibium.
How do I run one Vibium flow over many data rows in JavaScript?
Put your cases in an array of objects, launch one browser, and iterate with for...of, giving each row its own page. This is the whole pattern — everything else is a refinement of it.
const { browser } = require('vibium/sync');
// The data: each object is one test case.
const cases = [
{ user: 'alice', pass: 'correct-horse', expect: 'Welcome, Alice' },
{ user: 'bob', pass: 'hunter2', expect: 'Welcome, Bob' },
{ user: 'locked', pass: 'whatever', expect: 'Account locked' },
{ user: '', pass: '', expect: 'Username is required' },
];
// The flow: written once, runs per row.
function runLogin(bro, { user, pass, expect }) {
const page = bro.page();
page.go('https://example.com/login');
page.find({ label: 'Username' }).type(user);
page.find({ label: 'Password' }).type(pass);
page.find({ role: 'button', text: 'Sign in' }).click();
const banner = page.find('.flash').text();
const ok = banner.includes(expect);
console.log(`${ok ? 'PASS' : 'FAIL'} user="${user}" → "${banner}"`);
page.close();
return ok;
}
const bro = browser.launch({ headless: true });
try {
const results = cases.map((row) => runLogin(bro, row));
const passed = results.filter(Boolean).length;
console.log(`\n${passed}/${cases.length} cases passed`);
} finally {
bro.close();
}Two design choices make this solid. The browser launches once and is reused across every row, which is fast — spinning up Chrome per case would dominate the runtime. And each row gets a fresh page via bro.page() then page.close(), so one row's navigation and form state never carry into the next.
Notice the flow function takes the row as a plain object and never mentions specific values. That is the essence of data-driven design: the function knows how to log in, the array knows what to log in with.
How do I read test data from a JSON or CSV file?
Keep the data out of your source file so non-programmers can edit it and version control shows clean diffs. Vibium does nothing special here — you read the file with Node's standard tools and hand the parsed rows to the same loop.
For JSON, require or fs.readFileSync plus JSON.parse is all you need:
const fs = require('fs');
// cases.json: [{ "user": "alice", "pass": "...", "expect": "Welcome, Alice" }, ...]
const cases = JSON.parse(fs.readFileSync('./cases.json', 'utf8'));For CSV, a tiny parser handles simple, comma-clean files; reach for a library such as csv-parse the moment your data contains quoted commas or newlines:
const fs = require('fs');
function loadCsv(path) {
const [header, ...lines] = fs.readFileSync(path, 'utf8').trim().split('\n');
const keys = header.split(',');
return lines.map((line) => {
const cells = line.split(',');
return Object.fromEntries(keys.map((k, i) => [k, cells[i]]));
});
}
// cases.csv:
// user,pass,expect
// alice,correct-horse,Welcome Alice
const cases = loadCsv('./cases.csv');Because loadCsv returns the same array-of-objects shape as the inline version, the for...of loop above does not change by a single character. Swapping the data source never touches the automation. For the fields your flow reads on each row, the find() reference covers every selector strategy — role, label, text, placeholder, and testid.
How do I generate a test matrix from combined inputs?
Build the data instead of hand-writing it when the cases are the cross-product of several variables — browsers times locales times plans, for example. A short generator turns a few small lists into every combination, so you cover a grid without typing out dozens of rows by hand.
// Cross-product of independent dimensions → one row per combination.
function matrix(dimensions) {
return Object.entries(dimensions).reduce(
(acc, [key, values]) =>
acc.flatMap((row) => values.map((v) => ({ ...row, [key]: v }))),
[{}]
);
}
const cases = matrix({
locale: ['en', 'de', 'ja'],
plan: ['free', 'pro'],
});
// → 6 rows: {locale:'en',plan:'free'}, {locale:'en',plan:'pro'}, ...Feed cases straight into the same loop you already have. The value of generating the grid is that adding a third locale or a new plan tier expands coverage automatically — you edit one array, and every combination that involves it appears. Be deliberate, though: a full cross-product grows multiplicatively, so cap the dimensions you truly need to combine and pull the rest into separate, smaller tables.
When inputs are less structured — fuzzing a search box, stress-testing a form — generate randomized rows behind a seeded function so failures stay reproducible:
// Seeded pseudo-random so a failing run can be replayed exactly.
function seeded(seed) {
return () => (seed = (seed * 1103515245 + 12345) & 0x7fffffff) / 0x7fffffff;
}
const rnd = seeded(42);
const terms = ['a', ' ', '<script>', '日本語', '"; DROP TABLE'];
const cases = Array.from({ length: 20 }, () => ({
term: terms[Math.floor(rnd() * terms.length)],
}));Seeding matters because an unseeded random generator makes a red build impossible to reproduce — you would never know which input broke. Log the seed with the results and any failure becomes a one-line replay.
Which data-driven approach should I choose?
Match the data source to the size and owner of your test data. Small, developer-owned cases belong inline; large or business-owned data belongs in a file; anything read from a live system belongs behind a generator function.
| Approach | Where data lives | Best for | Trade-off |
|---|---|---|---|
| Inline array | In the test file | A handful of cases, tight dev loop | Recompiles/edits code to add a case |
| JSON file | Separate .json | Structured cases, nested inputs | Slightly noisier to hand-edit |
| CSV file | Separate .csv | Big tables, non-devs editing in a spreadsheet | Needs a parser for quoted/edge values |
| Runner parametrize | Test file + runner | pytest / Jest suites wanting per-row reports | Tied to that runner's API |
| Generated | A function/DB/faker | Fuzzing, huge volumes, live fixtures | Non-deterministic unless you seed it |
There is no single "right" one — most real suites mix them. Credential edge cases stay inline next to the test, the 300-row product catalog lives in CSV, and a randomized stress pass calls a generator. Vibium is indifferent to all three because it only ever sees the parsed row.
How do I do data-driven testing in Python with pytest?
Use @pytest.mark.parametrize to turn one test function into one case per row — pytest reports each independently, so a failure names the exact input. This is the idiomatic Python answer and needs no plugin beyond pytest itself.
import pytest
from vibium import browser_sync as browser
CASES = [
("alice", "correct-horse", "Welcome, Alice"),
("bob", "hunter2", "Welcome, Bob"),
("locked", "whatever", "Account locked"),
("", "", "Username is required"),
]
@pytest.fixture(scope="module")
def bro():
b = browser.launch(headless=True)
yield b
b.quit()
@pytest.mark.parametrize("user,password,expected", CASES)
def test_login(bro, user, password, expected):
page = bro.page()
page.go("https://example.com/login")
page.find(label="Username").type(user)
page.find(label="Password").type(password)
page.find(role="button", text="Sign in").click()
assert expected in page.find(".flash").text()
page.close()Run pytest -v and you get four named results — test_login[alice-correct-horse-Welcome, Alice] and friends — instead of one opaque pass/fail. The browser launches once via a module-scoped fixture, while each parameter set gets a fresh page, mirroring the JavaScript design.
To drive the same test from a CSV instead of an inline list, load the rows and hand them to parametrize:
import csv
import pytest
from vibium import browser_sync as browser
def load_cases(path):
with open(path, newline="") as f:
return [(r["user"], r["password"], r["expected"]) for r in csv.DictReader(f)]
@pytest.mark.parametrize("user,password,expected", load_cases("cases.csv"))
def test_login_from_csv(bro, user, password, expected):
page = bro.page()
page.go("https://example.com/login")
page.find(label="Username").type(user)
page.find(label="Password").type(password)
page.find(role="button", text="Sign in").click()
assert expected in page.find(".flash").text()
page.close()For the full fixture, screenshot-on-failure, and parallel-run setup this builds on, see how to use Vibium with pytest.
How do I keep each data row isolated so tests stay independent?
Give every row its own context, not just its own page, whenever the flow logs in or writes cookies. A Vibium context is an isolated cookie jar and storage sandbox, so a session created for row one is invisible to row two even though both share the same browser process.
const { browser } = require('vibium/sync');
const users = [
{ name: 'alice', plan: 'pro' },
{ name: 'bob', plan: 'free' },
];
const bro = browser.launch({ headless: true });
try {
for (const u of users) {
const ctx = bro.newContext(); // fresh cookies + storage per row
const page = ctx.newPage();
page.go('https://app.example.com/login');
page.find({ label: 'Username' }).type(u.name);
page.find({ role: 'button', text: 'Log in' }).click();
const plan = page.find('[data-testid="plan-badge"]').text();
console.log(`${u.name}: expected ${u.plan}, saw ${plan}`);
ctx.close(); // wipe this row's session
}
} finally {
bro.close();
}A fresh page (bro.page()) is enough when rows only read public pages, because there is no session to leak. Reach for a fresh context (bro.newContext()) the moment a row authenticates, so a logged-in alice never contaminates bob's assertions. This isolation is the single biggest factor in whether a large data-driven run is trustworthy — for the wider list of causes, read how to write flake-free tests.
Can I assert on data rows in plain English with check()?
Yes — Vibium's AI-native check() lets a data row carry a natural-language expectation instead of a brittle selector-and-string comparison, which is ideal when the outcome is easy to describe but awkward to pin to one DOM node.
const { browser } = require('vibium/sync');
const cases = [
{ term: 'laptop', claim: 'at least one product result is shown' },
{ term: 'asdfqwer', claim: 'a no-results or empty-state message is shown' },
];
const bro = browser.launch({ headless: true });
try {
for (const c of cases) {
const page = bro.page();
page.go('https://shop.example.com');
page.find({ placeholder: 'Search products' }).type(c.term);
page.find({ role: 'button', text: 'Search' }).click();
const result = page.check(c.claim);
console.log(`"${c.term}" → ${result.passed ? 'PASS' : 'FAIL'} (${result.reason})`);
page.close();
}
} finally {
bro.close();
}check() screenshots the page, sends it to a multimodal model, and returns a structured { passed, reason, confidence }, so your data table can express intent — "an error is shown", "the cart total updated" — rather than a fragile exact string. Use deterministic assertions (text(), value(), isVisible()) for values you can name precisely, and reserve check() for outcomes that are visual or fuzzy. The two styles coexist row by row.
How do I structure a larger data-driven suite?
Keep three things in separate places: the data, the flow (ideally a page object), and the runner glue that loops. When each lives on its own, changing the UI touches only the page object, adding cases touches only the data file, and switching runners touches only the glue.
A maintainable layout looks like this:
tests/
data/
logins.csv # the rows — owned by QA, editable in a spreadsheet
products.json
pages/
login_page.js # the flow — a page object owning selectors
login.spec.js # the glue — loads data, loops, asserts
The page object hides selectors behind intent-named methods, so the data loop never sees a find() call:
// pages/login_page.js
class LoginPage {
constructor(page) { this.page = page; }
open() { this.page.go('https://example.com/login'); return this; }
loginAs(user, pass) {
this.page.find({ label: 'Username' }).type(user);
this.page.find({ label: 'Password' }).type(pass);
this.page.find({ role: 'button', text: 'Sign in' }).click();
return this;
}
flashText() { return this.page.find('.flash').text(); }
}
module.exports = { LoginPage };Now the spec is pure orchestration — load rows, loop, drive the page object, assert — with zero selectors in sight. This is data-driven and Page Object Model working together: the pattern that keeps a 500-row suite as readable as a 5-row one. For the full treatment of the page layer, see the Page Object Model with Vibium and, for selector choices inside it, selector best practices.
What are the common data-driven pitfalls, and how do I avoid them?
Most data-driven failures trace to shared state, selector fragility, or unhelpful reporting — none of which are Vibium's doing, and all of which have a one-line fix. Knowing them up front saves a debugging afternoon.
| Pitfall | Symptom | Fix |
|---|---|---|
| State leaks between rows | Row N passes alone but fails after row N-1 | Fresh newContext() per authenticating row |
| One browser per row | Suite is slow, CI times out | Launch once, reuse; new page/context per row |
| Brittle selectors in the flow | A CSS tweak breaks every row at once | Use role/label/testid, not deep CSS paths |
Manual sleep() in the loop | Random timeouts on slow rows | Trust Vibium's auto-wait; delete the sleeps |
| Opaque failures | "1 of 500 failed" with no clue which | Use parametrize/named cases; log the row |
| Silent bad data | A typo'd CSV cell passes as valid | Validate rows before the loop; assert on shape |
The through-line is that data-driven testing amplifies whatever you feed it — good structure scales beautifully, and a single hidden coupling multiplies into hundreds of flaky results. Get isolation and selectors right first, then pour in data with confidence. For scaling the run itself across cores, see how to parallelize Vibium tests, and to understand why deleting sleeps is safe, the glossary defines actionability and auto-waiting.
Data-driven vs keyword-driven vs BDD: which fits when?
Choose data-driven when the steps are fixed and only the values change; reach for keyword-driven or BDD when the steps themselves vary or when non-programmers must read the cases. All three run on the same Vibium engine — they differ in what varies and who owns the test.
| Style | What varies | Who owns it | Best for |
|---|---|---|---|
| Data-driven | Inputs and expected values | Developers / QA | One journey checked against many values |
| Keyword-driven | The sequence of actions | QA / test analysts | Reusable action vocabulary across many flows |
| BDD (Gherkin) | Scenarios in business language | Product + QA together | Shared, human-readable acceptance specs |
Data-driven is the simplest and covers the majority of real needs, which is why it should be your default. A single login flow rarely needs a keyword table — it needs a table of credentials, and that is exactly what data-driven provides.
The three are not rivals; they layer. A .feature file's Scenario Outline in BDD is literally data-driven testing wearing Gherkin — its Examples: table is your rows, and each row runs the same steps. So adopting BDD does not mean abandoning this pattern; it means expressing it in business language. If your team wants that layer, Vibium with Cucumber / BDD shows the Scenario Outline plus step-definition setup that wraps the exact same Vibium calls used here.
A fair caveat: data-driven testing shines only when the journey genuinely is one flow. If different rows need meaningfully different steps — one skips a screen, another handles a modal — forcing them into a single parameterized function produces a tangle of if branches. That is the signal to split into separate tests or move to a keyword/BDD structure, not to keep bending one flow around divergent data.
Next steps
- What is Vibium — the engine your data loop drives
- Install Vibium —
pip install vibium/npm install vibium - Use Vibium with pytest — the fixture and
parametrizesetup this builds on - The Page Object Model with Vibium — where your parameterized flow should live
- Selector best practices — keep each row's
find()resilient - Parallelize Vibium tests — run a big data table across cores
- Follow the 45-day Vibium roadmap and take the course
Frequently asked questions
What is data-driven testing in Vibium?
Data-driven testing runs the same Vibium browser flow against many sets of input and expected output. You keep one automation function — go, find, type, click, assert — and feed it rows from an array, CSV, or JSON file. Each row becomes its own test case, so adding coverage means adding data, not code.
How do I loop test data over a Vibium script?
Load your cases into a list of objects, then iterate. In JavaScript use for...of over an array of records and call your flow function per row; in Python use a for loop or pytest's @pytest.mark.parametrize. Launch one browser, give each row a fresh page or context, and assert on that row's expected value.
Can I read test data from a CSV or JSON file with Vibium?
Yes. Vibium is just a library, so use your language's normal tools. In Node.js read JSON with require or fs.readFileSync plus JSON.parse, and parse CSV with a small split or a library like csv-parse. In Python use the built-in csv and json modules. Vibium never dictates where your data lives.
Should each data row get its own browser in Vibium?
Not a whole browser — a fresh page or context. Launch one browser once, then give each row a new page with bro.page() or an isolated context with bro.newContext(). Contexts have separate cookies and storage, so logins and state from one row never leak into the next, which keeps data-driven runs independent.
How is data-driven testing different from keyword-driven testing?
Data-driven testing varies the inputs while the steps stay fixed — one login flow, many credential pairs. Keyword-driven testing varies the steps themselves, describing each action as a keyword in a table. Data-driven is simpler and covers most needs; use it whenever the same journey must be checked against many values.
Does data-driven testing make Vibium tests flaky?
No, not by itself. Flakiness comes from shared state between rows and brittle selectors, not from the data loop. Isolate each row with its own context, use semantic selectors like role and label, and rely on Vibium's built-in auto-waiting instead of sleeps. Do that and a 500-row run is as stable as a single case.
Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.
Related guides
Vibium Best Practices: The Complete Guide
Vibium best practices for reliable browser automation: semantic locators, actionability waits, page objects, isolation, CI, and AI checks.
13 min read→Best PracticesA Complete Vibium CI/CD Pipeline
Build a complete Vibium CI/CD pipeline: install, headless run, parallel shards, artifact capture, and quality gates that block bad merges on every push.
12 min read→Best PracticesRun Vibium with Docker Compose
Run Vibium with Docker Compose: orchestrate a headless test service, an app-under-test, mounted artifact volumes, healthchecks, and parallel workers.
15 min read→Best PracticesRun Vibium Tests on Kubernetes
Run Vibium tests on Kubernetes: build a headless image, ship it as a Job, scale parallel Pods, and pull screenshots and traces as artifacts.
15 min read→