VLearnVibium

Vibium MCP Tools Reference

Vibium MCP tools reference: every browser tool the built-in MCP server exposes — launch, navigate, find, click, type, screenshot, tabs — with arguments.

By Pramod Dutta··5 min read·Verified with Vibium 26.2
▶ Animated overview · made with Remotion

The Vibium MCP server exposes a complete browser toolset that an AI agent calls to navigate, read, interact with, and screenshot a real Chrome browser. Vibium is AI-native browser automation on WebDriver BiDi — a single Go binary that auto-downloads Chrome for Testing and ships a Model Context Protocol (MCP) server, created by Jason Huggins (co-creator of Selenium and Appium). The tools cover the session lifecycle, navigation, page reading, element finding, interaction, screenshots, waiting, and tab management. Each tool advertises a JSON Schema inputSchema so the model knows exactly which arguments are required. You start the server with npx -y vibium mcp (Node) after npm install vibium, or use the Python client via pip install vibium. This reference documents the tool categories, their arguments, and how to list the live catalog for your exact version, since the set grows with each release.

How do you list the live MCP tool catalog?

The authoritative list for your installed version comes straight from the server. MCP hosts discover tools on startup by sending a tools/list request; you can do the same from a terminal:

echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | npx -y vibium mcp

Each entry returns a name, a description, and an inputSchema (JSON Schema). The schema is what the LLM reads to call the tool correctly — it lists parameter names, types, and which are required. Always treat this output as ground truth; the table below documents the stable, commonly available tools.

What are the session lifecycle tools?

These tools open and close the browser. An agent calls browser_launch first and browser_quit last; everything else happens in between.

ToolDescriptionKey arguments
browser_launchStart a browser session (visible by default)headless (boolean, optional)
browser_quitClose the browser session

Launch is visible by default, which is ideal while developing an agent so you can watch it work. Pass headless: true for CI or background runs. For host-level configuration, see set up Vibium MCP in Claude Code.

What are the navigation and page-reading tools?

These tools move between pages and let the agent perceive what is on screen. Reading the page is how the model decides its next action.

ToolDescriptionKey arguments
browser_navigateNavigate to a URLurl (string, required)
browser_get_urlGet the current page URL
browser_get_titleGet the current page title
browser_get_textGet text content of the page or an elementselector (string, optional)
browser_get_htmlGet HTML (innerHTML or outerHTML)selector, outer (optional)

Prefer browser_get_text over browser_get_html when feeding context to the model — text is cheaper and less noisy. The equivalent script-level commands are documented in navigate with go and get text.

What are the element-finding tools?

These tools locate elements so the agent can act on them. Vibium supports both CSS selectors and semantic targeting.

ToolDescriptionKey arguments
browser_findFind one element; returns tag, text, bounding boxselector (string, required)
browser_find_allFind all matching elementsselector (string, required)

In the Python and JS clients, find() also accepts semantic keyword arguments — role, text, label, placeholder, testid — which align with the accessibility tree. That semantic matching is what makes agents resilient when class names or markup change. See find an element for the full matcher set.

What are the interaction tools?

These tools perform actions a user would: clicking, typing, hovering, scrolling, pressing keys, and choosing dropdown options. Vibium auto-waits for actionability before each one, so the agent rarely needs explicit waits.

ToolDescriptionKey arguments
browser_clickClick an element by CSS selectorselector (string, required)
browser_typeType text into an elementselector, text (required)
browser_hoverHover over an elementselector (string, required)
browser_scrollScroll the page or an elementselector, direction/amount (optional)
browser_keysPress a key or key combinationkeys (string, required)
browser_selectSelect a dropdown optionselector, value/label (required)

Auto-waiting means a click waits until the target is visible, stable, receiving events, and enabled. The script equivalents live in click and type text.

What are the screenshot and wait tools?

Screenshots give the agent (and you) visual proof, and the wait tool handles the rare case where the agent needs to block on a specific element state.

ToolDescriptionKey arguments
browser_screenshotCapture a screenshotselector, filename (optional)
browser_evaluateExecute JavaScript in the pagescript (string, required)
browser_waitWait for an element stateselector, state (required)

By default, screenshots save to ~/Pictures/Vibium/ (macOS/Linux) or Pictures\Vibium\ on Windows; pass --screenshot-dir when registering the server to change this, or --screenshot-dir "" to return inline base64 only. Use browser_evaluate sparingly — for reading values the other tools cannot reach. See screenshot and wait for an element.

What are the tab management tools?

These tools handle multiple tabs, which agents need for flows that open new windows (auth popups, "open in new tab" links, comparison shopping).

ToolDescriptionKey arguments
browser_new_tabOpen a new taburl (optional)
browser_list_tabsList all open tabs
browser_switch_tabSwitch to a tab by index or URLindex or url
browser_close_tabClose a tabindex (optional)

How do you confirm a tool's exact arguments?

Inspect the tool's inputSchema from the live tools/list response. The schema is the contract: it names every parameter, its type, and which are required — exactly what the model uses to construct a valid call.

echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
  | npx -y vibium mcp | jq '.result.tools[] | {name, inputSchema}'

If a tool call fails, the schema is the first place to check — a missing required field or a wrong type is the usual cause. For deeper diagnosis, see how to debug the Vibium MCP server.

Next steps

Frequently asked questions

What tools does the Vibium MCP server expose?

Vibium's built-in MCP server exposes a browser toolset covering the session lifecycle (launch, quit), navigation, reading the page (text, HTML, URL, title), finding elements, interaction (click, type, hover, scroll, keys, select), screenshots, waiting, and tab management — enough for an agent to complete real web tasks.

How do I list the exact MCP tools Vibium provides?

Send a tools/list JSON-RPC request to the server: echo a request into 'npx -y vibium mcp' and read the result. Each tool returns its name, description, and inputSchema, which tells the agent the exact arguments and which are required.

What arguments do Vibium MCP tools take?

Each tool defines a JSON Schema inputSchema. For example, browser_navigate takes a url string, browser_click takes a CSS selector, and browser_type takes a selector plus text. The schema marks required versus optional fields, so the LLM calls each tool correctly.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides