VLearnVibium

How to Extract All Links from a Page with Vibium

Extract all links from a page with Vibium in Python — grab every anchor with findAll(), read each href with attr(), and build a clean, de-duplicated URL list.

By Pramod Dutta··4 min read·Verified with Vibium 26.2
▶ Animated overview · made with Remotion

To extract all links from a page with Vibium, navigate to the URL, grab every anchor with findAll("a"), then read each element's attr("href") for the URL and text() for the label. findAll() returns a plain Python list, so you loop over the links exactly as you would any list, collect the href values, and de-duplicate them. Because Vibium drives a real Chrome browser over WebDriver BiDi, it captures links injected by JavaScript after load, not just the ones in the original HTML. The result is a clean, complete set of every link on the page — ready to crawl, audit for broken URLs, or feed into a sitemap. No driver setup, no manual waits, and no headless-browser plumbing to manage yourself.

from vibium import browser_sync as browser
 
vibe = browser.launch()
vibe.go("https://example.com")
 
links = vibe.findAll("a")
for link in links:
    href = link.attr("href")
    label = link.text()
    print(f"{label} -> {href}")
 
vibe.quit()

This opens the page, collects every <a> element with findAll("a"), then prints each link's visible text alongside its href. findAll() returns the whole list of matches (where find() returns only the first), so a single loop walks every anchor on the page.

How does each step work?

  1. vibe.go(url) — opens the page and waits for the load event before you read anything.
  2. vibe.findAll("a") — returns a list of every anchor element. Unlike find(), which gives you the first match, findAll() returns the complete set so you can iterate over all links.
  3. link.attr("href") — reads the href attribute, which holds the link's destination URL.
  4. link.text() — reads the anchor's visible label, useful for context or filtering.
  5. vibe.quit() — shuts the browser down.

findAll() returns immediately and yields an empty list when nothing matches, so a page with no links will not hang your script.

How do I get clean, absolute, de-duplicated URLs?

Raw href values are often relative (/about) or duplicated across a nav bar and footer. Resolve them against the page URL and drop duplicates with urljoin and a set:

from urllib.parse import urljoin
from vibium import browser_sync as browser
 
vibe = browser.launch()
page_url = "https://example.com"
vibe.go(page_url)
 
hrefs = [a.attr("href") for a in vibe.findAll("a")]
clean = sorted({
    urljoin(page_url, h)
    for h in hrefs
    if h and not h.startswith(("#", "javascript:", "mailto:"))
})
 
for url in clean:
    print(url)
 
vibe.quit()

urljoin turns /about into https://example.com/about, the set removes repeats, and the filter skips in-page anchors, javascript: handlers, and mailto: links.

Compare each link's hostname against the page's own host. Same host means internal navigation; a different host means an outbound link:

from urllib.parse import urljoin, urlparse
 
base_host = urlparse(page_url).netloc
internal, external = [], []
 
for h in hrefs:
    if not h:
        continue
    full = urljoin(page_url, h)
    host = urlparse(full).netloc
    (internal if host == base_host else external).append(full)
 
print(f"{len(internal)} internal, {len(external)} external")

This is the core of a focused crawler: follow internal links to map a site, and report external links for an outbound-link audit.

If links are rendered by JavaScript after load, wait for the first anchor to be visible before collecting. findAll() resolves immediately and would miss late links if you call it too early, so wait on a single element first:

vibe.find("a").wait_until("visible")  # wait until at least one link shows
links = vibe.findAll("a")

wait_until() polls until the element reaches the requested state (visible, hidden, attached, or detached) or the timeout is hit, so a slow-rendering page will not race your script.

  • Scope your selectors — use vibe.findAll("main a") or a container selector to skip boilerplate nav and footer links when you only want content links.
  • Filter empty hrefs — anchors used as buttons often have no href; guard with if h before processing.
  • Save as you go — write URLs to a file inside the loop on large pages so a crash does not lose your progress.

Next steps

Frequently asked questions

How do I extract all links from a page with Vibium?

Navigate to the page, call findAll('a') to get every anchor element, then read each link's href with attr('href') and the visible label with text(). Collect the pairs into a list, then de-duplicate and resolve relative URLs to get a clean set of absolute links.

How do I get only external links with Vibium?

Pull every href with findAll('a') and attr('href'), then filter the list in Python by comparing each URL's hostname against the page's own domain. Keep links whose host differs to get outbound links, or keep matching hosts for internal navigation.

Does Vibium read links rendered by JavaScript?

Yes. Vibium drives a real Chrome browser over WebDriver BiDi, so findAll('a') sees anchors added by JavaScript after load, just like a user would. If links appear late, wait for one anchor to be visible first, then call findAll to capture the full set.

Vibium is created by Jason Huggins. This is an independent tutorial — see the official Vibium site and GitHub repo for canonical docs.

Related guides