The Snapshot Format

Every action returns a structured snapshot your LLM can immediately reason about. This is BabelWrap's core innovation.

Why Snapshots?

Traditional browser automation gives agents raw HTML -- thousands of lines of nested divs, scripts, and styles. An LLM has to parse all of that to understand a page. BabelWrap solves this by converting any webpage into a clean, structured representation that contains only what matters: the content, the input fields, the clickable actions, the forms, and the navigation.

Every action endpoint returns a snapshot of the page after the action completes. Your agent always knows the current page state without extra API calls.

JSON Format

The snapshot is a JSON object with these top-level fields:

{
  "url": "https://example.com/login",
  "title": "Sign In -- Example",
  "content": "Welcome back. Sign in to continue.",
  "inputs": [
    {
      "id": "email-field",
      "label": "Email address",
      "type": "text",
      "value": ""
    },
    {
      "id": "password-field",
      "label": "Password",
      "type": "password",
      "value": ""
    },
    {
      "id": "remember-me",
      "label": "Remember me",
      "type": "checkbox",
      "checked": false
    }
  ],
  "actions": [
    {
      "id": "sign-in-btn",
      "label": "Sign In",
      "type": "button",
      "primary": true
    },
    {
      "id": "forgot-password",
      "label": "Forgot password?",
      "type": "link",
      "href": "/reset"
    },
    {
      "id": "create-account",
      "label": "Create account",
      "type": "link",
      "href": "/signup"
    },
    {
      "id": "google-login",
      "label": "Continue with Google",
      "type": "button"
    }
  ],
  "navigation": ["Home", "Products", "Pricing", "Blog", "Docs"],
  "alerts": [],
  "forms": [
    {
      "id": "login-form",
      "fields": ["email-field", "password-field", "remember-me"],
      "submit": "sign-in-btn"
    }
  ],
  "tables": [],
  "lists": []
}

Field Reference

Field	Type	Description
`url`	string	Current page URL
`title`	string	Document title
`content`	string	Main readable text content of the page
`inputs`	array	All form fields (text, email, password, checkbox, radio, select, textarea)
`actions`	array	All clickable elements (buttons, links) with labels and types
`navigation`	array	Site navigation link labels
`alerts`	array	Visible messages: errors, success banners, warnings
`forms`	array	Logical form groupings with their fields and submit buttons
`tables`	array	Structured table data extracted from the page
`lists`	array	Structured list data extracted from the page

Input Fields

Each entry in inputs represents a form field:

Property	Type	Description
`id`	string	Unique identifier for this element (used internally for resolution)
`label`	string	Human-readable label (from `<label>`, placeholder, or aria-label)
`type`	string	Input type: text, email, password, checkbox, radio, select, textarea
`value`	string	Current value of the field (empty string if not filled)
`checked`	boolean	For checkboxes and radios: whether the field is checked
`options`	array	For select fields: list of available options

Action Elements

Each entry in actions represents a clickable element:

Property	Type	Description
`id`	string	Unique identifier
`label`	string	Visible text or aria-label
`type`	string	Element type: button, link, tab, menu-item
`href`	string\|null	Link destination (for links only)
`primary`	boolean	Whether this appears to be the primary/main action on the page

Form Groupings

Each entry in forms groups related fields together:

Property	Type	Description
`id`	string	Form identifier
`fields`	array	List of input IDs that belong to this form
`submit`	string	ID of the submit button/action for this form

Alerts

Each entry in alerts represents a visible page message:

Property	Type	Description
`text`	string	Alert message text
`type`	string	Alert type: error, success, warning, info

Text Representation

When an MCP agent reads a snapshot, it sees a text representation optimized for LLMs:

URL: https://example.com/login
TITLE: Sign In -- Example

CONTENT:
  Welcome back. Sign in to continue.

INPUTS:
  [email-field] Email address (text, empty)
  [password-field] Password (password, empty)
  [remember-me] Remember me (checkbox, unchecked)

ACTIONS:
  [sign-in-btn] Sign In (button, primary)
  [forgot-password] Forgot password? (link -> /reset)
  [create-account] Create account (link -> /signup)
  [google-login] Continue with Google (button)

NAVIGATION:
  Home | Products | Pricing | Blog | Docs

FORMS:
  [login-form] fields: email-field, password-field, remember-me -> submit: sign-in-btn

ALERTS:
  (none)

This text format is compact enough for any LLM context window while retaining all the information needed to decide on the next action.

Using Snapshots Effectively

Check the snapshot after every action

Every action returns a fresh snapshot. Use it to verify the action worked and decide what to do next. For example, after a login submit, check if the url changed to a dashboard or if alerts contains an error message.

Reference elements by their labels

When calling click, fill, or submit, describe elements using the label text from the snapshot. For example, if the snapshot shows [sign-in-btn] Sign In (button, primary), use "the Sign In button" as your target.

Use forms to understand page structure

The forms array shows which fields belong together and which button submits them. This is especially useful on pages with multiple forms.

Watch for alerts

After submitting a form, check the alerts array for validation errors, success messages, or warnings. This tells your agent whether the action succeeded without needing to parse page content.

The snapshot format is designed to be self-documenting. An LLM reading a snapshot for the first time can immediately understand the page structure and decide on the next action without any prior training on the format.