How to Build an AI Agent That Browses the Web with Vibium
Build an AI agent that browses the web with Vibium — wire its built-in MCP server to an LLM so the agent can navigate, read, click, and screenshot live pages.
You build an AI agent that browses the web with Vibium by connecting its built-in MCP server to an LLM, then letting the model drive a real browser in a loop. Vibium is AI-native browser automation built on WebDriver BiDi: a single Go binary that auto-downloads Chrome for Testing and ships a Model Context Protocol (MCP) server out of the box. Because Vibium exposes actions like browser_navigate, browser_find, browser_click, browser_type, and browser_screenshot as MCP tools, any MCP-capable agent (Claude Code, Cursor, or a custom client) can perceive a page, decide what to do, and act — no glue code. Created by Jason Huggins, co-creator of Selenium and Appium, Vibium auto-waits for elements to become actionable, so your agent fights far fewer timing bugs. Install it with pip install vibium or npm install vibium. This guide shows the agent loop, the MCP wiring, and a runnable Python control script.
What is an AI agent that browses the web?
A web-browsing agent is an LLM that can take actions in a real browser instead of only generating text. It runs a perceive → decide → act loop: read the current page, reason about the goal, then call a tool to navigate, click, type, or capture a screenshot — repeating until the task is done.
Vibium is purpose-built for this. It is a single Go binary that speaks WebDriver BiDi, auto-downloads Chrome for Testing, and ships a built-in MCP server. That means the model never touches raw protocol messages — it just calls named tools and reads results back in plain text.
Why use Vibium instead of a DIY browser wrapper?
Vibium removes the two hardest parts of agentic browsing: protocol plumbing and flaky timing. The MCP server hands your model a ready-made tool catalog, and Vibium auto-waits for actionability — checking that an element is visible, stable, receiving events, and enabled before acting — so the agent doesn't have to sprinkle sleep() calls everywhere.
| Concern | DIY wrapper | Vibium |
|---|---|---|
| Tool definitions for the LLM | You write and maintain them | Built-in MCP server |
| Waiting for elements | Manual sleeps / retries | Auto-waits for actionability |
| Real browser | You wire up CDP/WebDriver | Auto-downloads Chrome for Testing |
| Semantic targeting | Custom code | find(role=…, text=…, label=…) |
How do you wire Vibium's MCP server to an agent?
Register Vibium's MCP server with your agent host. For Claude Code, one command adds it:
claude mcp add vibium -- npx -y vibium mcpStart a new session so the host re-runs tool discovery, then confirm it connected:
claude mcp listChrome for Testing downloads automatically the first time the browser launches. Now you can hand the agent a goal in plain English — "Go to news.ycombinator.com and give me the top three story titles" — and it will chain Vibium tools to finish the job. For the full host setup, see set up Vibium MCP in Claude Code, and for the complete tool catalog see the Vibium MCP tools reference.
How do you script the agent loop in Python?
If you want a custom agent rather than an MCP host, drive Vibium directly with its Python client and let your LLM choose actions. The loop is: read the page, ask the model for the next step, execute it with Vibium, repeat. Here is the act layer your agent calls into:
from vibium import browser_sync as browser
# The browser tools your agent's planner can call
vibe = browser.launch()
def navigate(url):
vibe.go(url)
return f"Loaded {url}"
def read_page():
# Give the model the page's structure, not raw HTML
return vibe.a11y_tree()
def click_text(text):
vibe.find(role="button", text=text).click()
return f"Clicked {text}"
def type_into(selector, value):
vibe.find(selector).type(value)
return f"Typed into {selector}"
def snapshot(path="step.png"):
png = vibe.screenshot()
with open(path, "wb") as f:
f.write(png)
return pathThe key idea: feed the model a compact, semantic view of the page. Vibium's a11y_tree() returns the accessibility tree — roles, names, and state — which is far cheaper and more stable to reason over than raw HTML. Then the model picks an action, you call the matching function, and you loop.
How does the agent perceive a page reliably?
Use the accessibility tree as the agent's eyes. It exposes each element the way assistive tech sees it — role, name, and state like checked — so the model can target elements semantically instead of guessing CSS selectors.
tree = vibe.a11y_tree()
# tree -> {"role": "WebArea", "children": [
# {"role": "textbox", "name": "Search"},
# {"role": "button", "name": "Submit"}, ...]}
# The agent decides "click Submit", and you act:
vibe.find(role="button", text="Submit").click()Vibium's find() accepts semantic keyword arguments — role, text, label, placeholder, testid — which map cleanly onto what the model reads in the tree. That alignment is what makes the loop dependable: the agent's plan ("click the Submit button") translates directly into a Vibium call. Learn more in find an element.
How do you let the agent confirm its own work?
Have the agent take a screenshot after key steps and read text back to verify the outcome. Closing this feedback loop is what separates a reliable agent from one that blindly fires actions.
vibe.go("https://example.com")
vibe.find("a").click()
print(vibe.find("h1").text()) # verify we landed where we expected
png = vibe.screenshot() # visual proof for the agent or a human
vibe.quit()Because text() reads live DOM and screenshot() returns real PNG bytes, your agent can check that a form submitted, a banner appeared, or a price matched — then decide whether to retry or move on. For a real flow, see automate login with Vibium.
When should you build an agent like this?
Build a Vibium browsing agent when your task needs live web interaction the model can verify: research across pages, reproducing a bug on a real site, QA-walking a checkout, or pulling data behind a login. If you only need a fixed, deterministic script with no LLM in the loop, a plain Vibium script (no agent) is simpler and cheaper. For an LLM-orchestrated workflow that adapts to whatever the page shows, the MCP-plus-agent approach shines — and it scales from a one-off Claude Code session to a fully custom planner. See agentic web testing with Vibium for the testing-focused version of this pattern.
Next steps
Frequently asked questions
How do I build an AI agent that browses the web with Vibium?
Connect Vibium's built-in MCP server to an LLM-based agent. Vibium exposes browser actions like navigate, find, click, type, and screenshot as MCP tools, so the model can drive a real Chrome browser by calling those tools in a perceive-decide-act loop, with no custom protocol code.
Does a Vibium web agent control a real browser?
Yes. Vibium launches real Chrome for Testing and drives it over WebDriver BiDi. Your agent interacts with live pages exactly as a person would, so it sees real DOM, real JavaScript, and real network responses rather than a simulated or text-only version of the web.
What does an AI browsing agent actually do on each step?
It perceives the page (reads text or the accessibility tree), decides the next action, then acts by calling a Vibium tool such as click or type. Vibium auto-waits for elements to be actionable, so the loop repeats reliably until the agent reaches its goal.
Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.
Related guides
Agentic Web Testing with Vibium
Agentic web testing with Vibium: let an AI agent explore, drive, and verify your app in a real browser via Vibium's MCP server and auto-waiting actions.
5 min read→MCP & AI AgentsHow to Debug the Vibium MCP Server
Debug the Vibium MCP server: verify it starts, test JSON-RPC by hand, list tools, fix Chrome launch and zombie-process issues, and read MCP errors.
5 min read→MCP & AI AgentsHow to Give Claude Browser Access with Vibium
Give Claude a real browser using Vibium's built-in MCP server. Once connected, Claude can navigate, click, type, and screenshot live web pages on its own.
3 min read→MCP & AI AgentsHow to Let an LLM Fill Forms with Vibium
Let an LLM fill forms with Vibium by connecting its built-in MCP server, then describe the form in plain English and let the model type and submit it.
4 min read→