The Snapshot Format

Every action returns a structured snapshot your LLM can immediately reason about. This is BabelWrap's core innovation.

Why Snapshots?

Traditional browser automation gives agents raw HTML -- thousands of lines of nested divs, scripts, and styles. An LLM has to parse all of that to understand a page. BabelWrap solves this by converting any webpage into a clean, structured representation that contains only what matters: the content, the input fields, the clickable actions, the forms, and the navigation.

Every action endpoint returns a snapshot of the page after the action completes. Your agent always knows the current page state without extra API calls.

JSON Format

The snapshot is a JSON object with these top-level fields:

{
  "url": "https://example.com/login",
  "title": "Sign In -- Example",
  "content": "Welcome back. Sign in to continue.",
  "inputs": [
    {
      "id": "email-field",
      "label": "Email address",
      "type": "text",
      "value": ""
    },
    {
      "id": "password-field",
      "label": "Password",
      "type": "password",
      "value": ""
    },
    {
      "id": "remember-me",
      "label": "Remember me",
      "type": "checkbox",
      "checked": false
    }
  ],
  "actions": [
    {
      "id": "sign-in-btn",
      "label": "Sign In",
      "type": "button",
      "primary": true
    },
    {
      "id": "forgot-password",
      "label": "Forgot password?",
      "type": "link",
      "href": "/reset"
    },
    {
      "id": "create-account",
      "label": "Create account",
      "type": "link",
      "href": "/signup"
    },
    {
      "id": "google-login",
      "label": "Continue with Google",
      "type": "button"
    }
  ],
  "navigation": ["Home", "Products", "Pricing", "Blog", "Docs"],
  "alerts": [],
  "forms": [
    {
      "id": "login-form",
      "fields": ["email-field", "password-field", "remember-me"],
      "submit": "sign-in-btn"
    }
  ],
  "tables": [],
  "lists": []
}

Field Reference

FieldTypeDescription
urlstringCurrent page URL
titlestringDocument title
contentstringMain readable text content of the page
inputsarrayAll form fields (text, email, password, checkbox, radio, select, textarea)
actionsarrayAll clickable elements (buttons, links) with labels and types
navigationarraySite navigation link labels
alertsarrayVisible messages: errors, success banners, warnings
formsarrayLogical form groupings with their fields and submit buttons
tablesarrayStructured table data extracted from the page
listsarrayStructured list data extracted from the page

Input Fields

Each entry in inputs represents a form field:

PropertyTypeDescription
idstringUnique identifier for this element (used internally for resolution)
labelstringHuman-readable label (from <label>, placeholder, or aria-label)
typestringInput type: text, email, password, checkbox, radio, select, textarea
valuestringCurrent value of the field (empty string if not filled)
checkedbooleanFor checkboxes and radios: whether the field is checked
optionsarrayFor select fields: list of available options

Action Elements

Each entry in actions represents a clickable element:

PropertyTypeDescription
idstringUnique identifier
labelstringVisible text or aria-label
typestringElement type: button, link, tab, menu-item
hrefstring|nullLink destination (for links only)
primarybooleanWhether this appears to be the primary/main action on the page

Form Groupings

Each entry in forms groups related fields together:

PropertyTypeDescription
idstringForm identifier
fieldsarrayList of input IDs that belong to this form
submitstringID of the submit button/action for this form

Alerts

Each entry in alerts represents a visible page message:

PropertyTypeDescription
textstringAlert message text
typestringAlert type: error, success, warning, info

Text Representation

When an MCP agent reads a snapshot, it sees a text representation optimized for LLMs:

URL: https://example.com/login
TITLE: Sign In -- Example

CONTENT:
  Welcome back. Sign in to continue.

INPUTS:
  [email-field] Email address (text, empty)
  [password-field] Password (password, empty)
  [remember-me] Remember me (checkbox, unchecked)

ACTIONS:
  [sign-in-btn] Sign In (button, primary)
  [forgot-password] Forgot password? (link -> /reset)
  [create-account] Create account (link -> /signup)
  [google-login] Continue with Google (button)

NAVIGATION:
  Home | Products | Pricing | Blog | Docs

FORMS:
  [login-form] fields: email-field, password-field, remember-me -> submit: sign-in-btn

ALERTS:
  (none)

This text format is compact enough for any LLM context window while retaining all the information needed to decide on the next action.

Using Snapshots Effectively

Check the snapshot after every action

Every action returns a fresh snapshot. Use it to verify the action worked and decide what to do next. For example, after a login submit, check if the url changed to a dashboard or if alerts contains an error message.

Reference elements by their labels

When calling click, fill, or submit, describe elements using the label text from the snapshot. For example, if the snapshot shows [sign-in-btn] Sign In (button, primary), use "the Sign In button" as your target.

Use forms to understand page structure

The forms array shows which fields belong together and which button submits them. This is especially useful on pages with multiple forms.

Watch for alerts

After submitting a form, check the alerts array for validation errors, success messages, or warnings. This tells your agent whether the action succeeded without needing to parse page content.

The snapshot format is designed to be self-documenting. An LLM reading a snapshot for the first time can immediately understand the page structure and decide on the next action without any prior training on the format.