VLearnVibium

Plain English to Browser Actions with Vibium MCP

Describe a web task in plain English and let an AI agent run it in a real browser through Vibium's MCP server — how it works and how to write good prompts.

By Pramod Dutta··2 min read·Verified with Vibium 26.2
▶ Animated overview · made with Remotion

With Vibium's built-in MCP server connected to an AI agent, you can describe a web task in plain English and the agent performs it in a real browser. You say what you want — "log in and add the first product to the cart" — and the agent calls Vibium's tools (navigate, find, click, type, screenshot), reading the page between steps. This is the core of AI-native automation: intent in, browser actions out.

How does plain English turn into actions?

The AI agent runs a sense → think → act loop using Vibium's MCP tools:

  1. Sense — it reads the page (accessibility tree / DOM) through Vibium.
  2. Think — it decides the next step from your instruction and what it sees.
  3. Act — it calls a Vibium tool (click, type, navigate) and observes the result.

It repeats until your goal is met — no selectors written by hand.

A plain-English example

With Vibium MCP connected (see Claude Code, Cursor, or GitHub Copilot), prompt:

"Open the TTA Cart demo, log in as standard_user, add the TTA Practice Backpack to the cart, and tell me the cart total."

The agent navigates, finds the login fields, types the credentials, clicks add-to-cart, opens the cart, and reads the total back to you.

From prompt to a saved test

Plain English is perfect for authoring. For a suite you run in CI, ask the agent to write the flow as a Vibium script:

"Now save that as a Vibium test file I can run with node."

You get deterministic, committable code — natural language to draft, real script to keep.

Next steps

Frequently asked questions

Can I automate a browser using plain English with Vibium?

Yes. With Vibium's MCP server connected to an AI agent (Claude Code, Cursor, Claude Desktop, or GitHub Copilot Agent), you describe the task in plain English and the agent calls Vibium's tools to perform the clicks, typing, and navigation in a real browser.

How does plain English become browser actions?

The AI agent reads your instruction, picks the right Vibium MCP tools (navigate, find, click, type, screenshot), and calls them in sequence — reading the page back between steps to decide what to do next, the way a person would.

Is plain-English automation reliable enough for tests?

It's excellent for exploration and drafting. For repeatable test suites, have the agent generate a Vibium script you save and run deterministically — use natural language to author, then commit the code.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides