Monitor Broken Links with Vibium
Monitor broken links with Vibium — crawl every anchor, check each URL's HTTP status in a real Chrome browser, and flag dead 404 and 500 links in CI.
To monitor broken links with Vibium, open the page in a real Chrome browser, collect every anchor with findAll("a"), read each link's URL with attr("href"), then check whether each URL returns a healthy HTTP status. Any URL that answers with 404, 410, 500, or a connection error is a broken link. Because Vibium drives Chrome over WebDriver BiDi, it captures links rendered by JavaScript, hidden behind a login, or injected into a single-page app — the exact links a real visitor would click. You extract the URLs with Vibium, then verify each one's status, either from the browser's own network events or with a lightweight HTTP client. Run it headless on a schedule or in CI, fail the build when a dead link appears, and you catch broken navigation before your users, or Google, ever do.
What is a broken-link monitor, and why build one with Vibium?
A broken-link monitor is a script that visits a page, collects every link on it, and reports which of those links no longer work. Dead links quietly erode trust: they frustrate visitors, waste crawl budget, and drag down SEO because search engines treat a page full of 404s as poorly maintained. The problem grows silently — a partner site takes a page down, a product URL changes, an internal route gets renamed, and nothing warns you until a user complains.
Vibium is a strong fit for this job when your links are not sitting in static HTML. It is AI-native browser automation built on WebDriver BiDi, shipped as a single Go binary that auto-downloads Chrome. That means it renders the page like a browser, so links added by JavaScript, revealed after login, or loaded inside a client-side router all show up in your crawl. A plain HTTP crawler that only reads raw HTML would miss every one of them.
The pattern is two clean stages: extract the links with Vibium, then check each URL's status. The rest of this guide builds that pipeline step by step, in both JavaScript and Python, and shows how to wire it into CI so it runs on every deploy.
How do I collect every link on a page with Vibium?
Start by opening the page and grabbing every anchor element with findAll("a"), then read each one's href. findAll() returns the complete list of matches (where find() returns only the first), so a single loop walks every link on the page.
const { browser } = require('vibium/sync')
const bro = browser.launch({ headless: true })
const vibe = bro.page()
vibe.go('https://example.com')
// Grab every anchor, read its destination and visible label
const links = vibe.findAll('a').map((a) => ({
href: a.attr('href'),
label: a.text().trim(),
}))
console.log(`Found ${links.length} links`)
bro.close()Here is the same extraction in Python, which reads almost identically:
from vibium import browser_sync as browser
vibe = browser.launch(headless=True)
vibe.go("https://example.com")
links = [
{"href": a.attr("href"), "label": a.text().strip()}
for a in vibe.findAll("a")
]
print(f"Found {len(links)} links")
vibe.quit()Each a.attr("href") returns the raw value of the anchor's href attribute, and a.text() gives you the visible label — useful later for reporting which link on the page is broken, not just its URL.
How do I clean and de-duplicate the URLs first?
Clean the list before you check anything, because raw hrefs are messy: many are relative (/about), plenty are duplicated across the nav and footer, and some are not real destinations at all (#, mailto:, javascript:). Checking those wastes requests and produces noisy reports.
from urllib.parse import urljoin
from vibium import browser_sync as browser
page_url = "https://example.com"
vibe = browser.launch(headless=True)
vibe.go(page_url)
raw = [a.attr("href") for a in vibe.findAll("a")]
vibe.quit()
# Resolve relative URLs, drop non-links, de-duplicate
clean = sorted({
urljoin(page_url, h)
for h in raw
if h and not h.startswith(("#", "javascript:", "mailto:", "tel:"))
})
print(f"{len(clean)} unique URLs to check")urljoin turns /about into https://example.com/about, the set collapses duplicates, and the filter skips in-page anchors and non-HTTP schemes. Now you have a tidy, absolute, de-duplicated list — exactly what a status checker needs.
How do I check whether each link is broken?
A link is broken when its URL returns an HTTP status of 400 or higher, or fails to connect at all. Vibium extracts the URLs; to read each URL's status, the simplest and fastest approach is a plain HTTP client that sends a request per URL and reads the response code. Vibium drives a browser — it is not a bulk HTTP checker — so pairing it with requests (Python) or fetch (Node) is the idiomatic split of labor.
Here is the Python checker. A HEAD request is cheapest because it asks only for headers, not the body; fall back to GET for the servers that reject HEAD.
import requests
def check(url):
try:
r = requests.head(url, allow_redirects=True, timeout=10)
# Some servers reject HEAD — retry with GET
if r.status_code >= 400:
r = requests.get(url, allow_redirects=True, timeout=10, stream=True)
return r.status_code
except requests.RequestException:
return 0 # connection failed / timeout / DNS error
broken = []
for url in clean:
status = check(url)
if status == 0 or status >= 400:
broken.append((url, status))
print(f"BROKEN {status or 'ERR'} {url}")
print(f"\n{len(broken)} broken out of {len(clean)} links")The Node.js equivalent uses the built-in fetch, so there is nothing extra to install:
async function check(url) {
try {
let res = await fetch(url, { method: 'HEAD', redirect: 'follow' })
if (res.status >= 400) {
res = await fetch(url, { method: 'GET', redirect: 'follow' })
}
return res.status
} catch {
return 0 // network error
}
}
const broken = []
for (const url of clean) {
const status = await check(url)
if (status === 0 || status >= 400) {
broken.push({ url, status })
console.log(`BROKEN ${status || 'ERR'} ${url}`)
}
}
console.log(`\n${broken.length} broken out of ${clean.length} links`)Treat a status of 0 (a thrown network error) as broken too — a link that times out or fails DNS is just as dead to a user as a 404.
Which status codes should I flag?
Not every non-200 response is a problem, so decide what "broken" means before you start failing builds. This table is a sane default policy.
| Status range | Meaning | Treat as | Why |
|---|---|---|---|
200–299 | Success | Healthy | The link works. |
301, 302, 307, 308 | Redirect | Log, not fail | Fine, but long chains slow pages and can hide a dying URL. |
401, 403 | Auth required / forbidden | Usually skip | Often expected for gated links; whitelist these hosts. |
404, 410 | Not found / gone | Broken | The classic dead link. Fail here. |
429 | Rate limited | Retry, then warn | You hit the server too fast, not a real break — back off. |
500–599 | Server error | Broken | The target is down or erroring. Fail here. |
0 (no response) | Timeout / DNS / refused | Broken | The user sees nothing load. Fail here. |
Following redirects (allow_redirects / redirect: 'follow') means a 301 resolves to its final destination, so you check where the link actually lands, not just the hop. Without it, a permanently moved page would report 301 and pass your check even if the URL it redirects to is itself a 404 — the break would hide one hop away.
One more nuance worth encoding: run your checks with a realistic User-Agent header. Some servers return 403 or serve a bot-challenge page to clients that look automated, which would flag a perfectly healthy link as broken. Setting a normal browser user agent on your HTTP client sidesteps that false alarm, and mirrors what Vibium's real Chrome already sends when it renders the page.
How do I catch links Vibium's own browser already loaded?
There is a second, zero-HTTP-client way to catch broken links: watch the traffic Vibium's browser makes as it renders the page. Vibium exposes live network events through on_response, so every request the page fires — assets, XHRs, API calls, and any navigation — reports its status without a separate checker.
from vibium import browser_sync as browser
vibe = browser.launch(headless=True)
# Log any response the page receives with a failing status
failures = []
vibe.on_response(lambda res: (
failures.append((res.status(), res.url()))
if res.status() >= 400 else None
))
vibe.go("https://example.com")
for status, url in failures:
print(f"{status} {url}")
vibe.quit()Register the listener before go() — callbacks only capture traffic that happens after they are attached. This catches a different, valuable class of breakage: a broken image, a 500 from a background API the UI silently swallows, a stylesheet that 404s, or a tracking script that never loads. Those never appear in your <a href> list, but they still degrade the page — a missing hero image is as visible to a user as a dead link, and a failing API call can leave half the page blank.
The trade-off is scope. on_response only sees requests the browser actually makes while rendering this page, so it reports the assets and calls that fire during load, not the destinations of links the user has not clicked. That is exactly why the two methods pair so well: one audits the page's own health as it loads, the other audits everywhere the page points.
The two techniques are complementary. Use on_response to audit what the page loads, and the findAll("a") plus status-check pipeline to audit where the page links. For deeper request inspection, the guide on monitoring network requests with Vibium covers wait_for_response, filtering, and reading response bodies.
How do I check links that need JavaScript or a login?
This is where Vibium earns its place over a static crawler: it checks links exactly as a browser renders them. If your links only appear after JavaScript runs, wait for the first anchor to be visible before collecting — findAll() resolves immediately and would miss late links if you call it too early.
# Wait until at least one link has rendered, then collect them all
vibe.find("a").wait_for(state="visible")
links = vibe.findAll("a")For links behind authentication, log in first, then crawl the pages a signed-in user sees. Vibium keeps the session for the life of the browser, so every page you visit afterward carries the logged-in state.
from vibium import browser_sync as browser
vibe = browser.launch(headless=True)
vibe.go("https://app.example.com/login")
vibe.find("#email").type("user@example.com")
vibe.find("#password").type("s3cret")
vibe.find({"role": "button", "text": "Sign in"}).click()
# Now crawl a page only visible after login
vibe.go("https://app.example.com/dashboard")
gated_links = [a.attr("href") for a in vibe.findAll("a")]
vibe.quit()The automate a login flow with Vibium guide covers this handshake in depth, including waiting for the post-login page and handling redirects. Semantic selectors like find({"role": "button", "text": "Sign in"}) also make the login step resilient when class names change — the find element reference lists every strategy you can combine.
How do I check links across a whole site, not one page?
To monitor a whole site, turn the single-page crawl into a small breadth-first crawler: start at the homepage, follow internal links, and check every URL you encounter along the way. Keep two structures — a queue of pages left to visit and a set of URLs already seen — so you never crawl the same page twice.
from urllib.parse import urljoin, urlparse
from vibium import browser_sync as browser
import requests
start = "https://example.com"
host = urlparse(start).netloc
vibe = browser.launch(headless=True)
to_visit = [start]
visited = set()
broken = []
while to_visit:
page = to_visit.pop()
if page in visited:
continue
visited.add(page)
vibe.go(page)
for a in vibe.findAll("a"):
href = a.attr("href")
if not href or href.startswith(("#", "mailto:", "javascript:")):
continue
url = urljoin(page, href)
# Check the link's status
try:
code = requests.head(url, allow_redirects=True, timeout=10).status_code
except requests.RequestException:
code = 0
if code == 0 or code >= 400:
broken.append((page, url, code))
# Queue internal pages to crawl next
if urlparse(url).netloc == host and url not in visited:
to_visit.append(url)
vibe.quit()
print(f"Crawled {len(visited)} pages, found {len(broken)} broken links")
for src, url, code in broken:
print(f" {code or 'ERR'} {url} (on {src})")Comparing each link's hostname against the start host keeps the crawl inside your own site: same host means an internal page worth following, a different host means an outbound link you check but do not recurse into. For very large sites, add a page limit or a max-depth counter so a run stays bounded, and cache each URL's status in the visited-style set so you never check the same link twice. The extract all links guide goes deeper on splitting internal from external links.
Vibium vs a static link checker: which should I use?
Choose based on where your links live. A static crawler is faster and lighter for pure HTML; Vibium is the right tool when links depend on a real browser to appear. Here is an honest comparison.
| Consideration | Vibium | Static link checker |
|---|---|---|
| JavaScript-rendered links | Captured — renders the page in Chrome | Missed — reads raw HTML only |
| Links behind login | Handled — log in, then crawl | Needs manual cookie/session wiring |
| Single-page app routes | Handled — sees the rendered DOM | Often missed |
| Raw speed on static sites | Slower — boots a browser per run | Faster — pure HTTP, high concurrency |
| Resource use | Heavier (a Chrome process) | Light |
| Broken images / failed XHRs | Caught via on_response | Not visible |
| Setup | pip install vibium / npm install vibium, no drivers | Varies by tool |
When to choose Vibium: your links are rendered client-side, gated behind auth, or live in an app you already test with Vibium — so extraction and checking share one browser session and one toolchain. When to choose a static checker: you are auditing a large, fully static site and want maximum throughput on plain HTTP requests. Many teams run both — a fast static sweep for the public site, and a Vibium pass for the logged-in app a static crawler cannot reach. For a broader tooling comparison, see Vibium vs Playwright and Vibium vs Selenium.
How do I run the broken-link check automatically in CI?
Make the check pay off by running it on every deploy and failing the build when a dead link appears. The recipe is: run the script headless, collect broken links, print each one with the page it was found on, and exit with a non-zero status code if the list is not empty. A non-zero exit is what turns a red CI check into a signal your team actually sees.
import sys
# ... crawl + fill `broken` as above ...
if broken:
print(f"\n{len(broken)} broken link(s) found:")
for src, url, code in broken:
print(f" {code or 'ERR'} {url} (on {src})")
sys.exit(1) # fail the build
print("All links healthy.")
sys.exit(0)A minimal GitHub Actions job runs it on a schedule and on every push. Vibium auto-downloads Chrome, so there is no browser-install step to maintain.
name: broken-link-check
on:
push:
schedule:
- cron: '0 6 * * *' # every day at 06:00 UTC
jobs:
links:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install vibium requests
- run: python check_links.pyBecause it runs headless and boots its own Chrome, the same script works unchanged on a laptop, a cron box, or a CI runner. The run on a server guide covers headless flags, sandbox settings, and the handful of Linux packages a bare container needs. To review CI failures with screenshots and a step-by-step timeline, Vibium's tracing captures a replay of each run — handy when a link breaks only in a specific environment.
Tips for a reliable broken-link monitor
- Follow redirects, but log the chain. Resolve
301/302to the final URL so you check where a link truly lands, and flag chains longer than two hops — they are slow and often precede a break. - Whitelist known-gated hosts. Links that return
401/403by design (a members area, a paywalled partner) will cry wolf; keep an allowlist so real breaks stand out. - Back off on
429. Rate limiting is not a broken link. Add a short delay between requests, or retry once after a pause, before you flag it. - Deduplicate before checking. A homepage may link the same URL a dozen times; check each unique URL once with a
setto keep runs fast and reports clean. - Report the source page. Always log which page a broken link was found on, not just the dead URL — it turns a vague alert into a one-minute fix.
- Scope the crawl. Cap pages, depth, or run time so a huge site or an accidental link loop cannot make a run hang forever.
Once this is running green in CI, fold the same discipline into the rest of your suite: pair link checks with a full-page screenshot for visual regressions, and structure larger crawlers with a page object model so selectors stay in one place. If you want AI to help build and run these checks, the Vibium MCP for Claude Code setup lets an agent drive the browser for you.
Next steps
Frequently asked questions
How do I monitor broken links with Vibium?
Open the page with Vibium, collect every anchor using findAll('a'), read each href with attr('href'), then check each URL's HTTP status. Any URL that returns 400 or higher — a 404 or 500 — is a broken link. Run the script on a schedule or in CI to catch dead links before users do.
Can Vibium check the status code of a link?
Yes, in two ways. Vibium sees live traffic through page.onResponse, so it captures the status of every request the browser makes. To check links it did not navigate to, pair Vibium's link extraction with a plain HTTP client such as Python requests or fetch and read each response's status code.
Does Vibium find links added by JavaScript?
Yes. Vibium drives a real Chrome browser over WebDriver BiDi, so findAll('a') captures anchors injected by JavaScript after load, not just links in the original HTML. Wait for one anchor to be visible first, then call findAll to capture the complete, fully rendered set of links.
How do I run a broken-link check automatically?
Save your Vibium script and run it headless on a schedule — a cron job, a GitHub Actions workflow, or a CI step. Exit with a non-zero status when any broken link is found so the build fails loudly, and log the offending URL plus the page it was found on for a quick fix.
What counts as a broken link?
A link is broken when its URL returns an HTTP status of 400 or higher: 404 (not found), 410 (gone), 500 (server error), or a connection failure. Redirects (301, 302) are usually fine but worth logging, since long redirect chains slow pages and can mask a URL that will eventually die.
Is Vibium better than a static link checker for this?
It depends. A static crawler is faster for pure HTTP checks. Vibium wins when links only appear after JavaScript runs, sit behind a login, or load inside a single-page app — because it checks the page exactly as a real browser renders it, then hands the URLs to your status checker.
Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.
Related guides
Accessibility Testing with Vibium
Accessibility testing with Vibium — read the a11y tree, assert on roles, names, and states, and catch WCAG issues in CI with no driver setup.
14 min read→How-To RecipesMixing API + Web Testing with Vibium
Mix API and web testing with Vibium — assert on backend JSON with waitForResponse and route while driving the real UI, in one script.
14 min read→How-To RecipesBulk Data Extraction with Vibium
Bulk data extraction with Vibium: build a repeatable scrape pipeline over a URL list, extract with findAll(), and write clean JSON, CSV, or a database.
13 min read→How-To RecipesE-commerce Test Automation with Vibium
E-commerce test automation with Vibium: script cart, checkout, and payment flows in JS or Python with auto-waiting, AI checks, and CI-ready smoke tests.
15 min read→