The Snapshot Format
Every action returns a structured snapshot your LLM can immediately reason about. This is BabelWrap's core innovation.
Why Snapshots?
Traditional browser automation gives agents raw HTML -- thousands of lines of nested divs, scripts, and styles. An LLM has to parse all of that to understand a page. BabelWrap solves this by converting any webpage into a clean, structured representation that contains only what matters: the content, the input fields, the clickable actions, the forms, and the navigation.
JSON Format
The snapshot is a JSON object with these top-level fields:
{
"url": "https://example.com/login",
"title": "Sign In -- Example",
"content": "Welcome back. Sign in to continue.",
"inputs": [
{
"id": "email-field",
"label": "Email address",
"type": "text",
"value": ""
},
{
"id": "password-field",
"label": "Password",
"type": "password",
"value": ""
},
{
"id": "remember-me",
"label": "Remember me",
"type": "checkbox",
"checked": false
}
],
"actions": [
{
"id": "sign-in-btn",
"label": "Sign In",
"type": "button",
"primary": true
},
{
"id": "forgot-password",
"label": "Forgot password?",
"type": "link",
"href": "/reset"
},
{
"id": "create-account",
"label": "Create account",
"type": "link",
"href": "/signup"
},
{
"id": "google-login",
"label": "Continue with Google",
"type": "button"
}
],
"navigation": ["Home", "Products", "Pricing", "Blog", "Docs"],
"alerts": [],
"forms": [
{
"id": "login-form",
"fields": ["email-field", "password-field", "remember-me"],
"submit": "sign-in-btn"
}
],
"tables": [],
"lists": []
}
Field Reference
| Field | Type | Description |
|---|---|---|
url | string | Current page URL |
title | string | Document title |
content | string | Main readable text content of the page |
inputs | array | All form fields (text, email, password, checkbox, radio, select, textarea) |
actions | array | All clickable elements (buttons, links) with labels and types |
navigation | array | Site navigation link labels |
alerts | array | Visible messages: errors, success banners, warnings |
forms | array | Logical form groupings with their fields and submit buttons |
tables | array | Structured table data extracted from the page |
lists | array | Structured list data extracted from the page |
Input Fields
Each entry in inputs represents a form field:
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier for this element (used internally for resolution) |
label | string | Human-readable label (from <label>, placeholder, or aria-label) |
type | string | Input type: text, email, password, checkbox, radio, select, textarea |
value | string | Current value of the field (empty string if not filled) |
checked | boolean | For checkboxes and radios: whether the field is checked |
options | array | For select fields: list of available options |
Action Elements
Each entry in actions represents a clickable element:
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier |
label | string | Visible text or aria-label |
type | string | Element type: button, link, tab, menu-item |
href | string|null | Link destination (for links only) |
primary | boolean | Whether this appears to be the primary/main action on the page |
Form Groupings
Each entry in forms groups related fields together:
| Property | Type | Description |
|---|---|---|
id | string | Form identifier |
fields | array | List of input IDs that belong to this form |
submit | string | ID of the submit button/action for this form |
Alerts
Each entry in alerts represents a visible page message:
| Property | Type | Description |
|---|---|---|
text | string | Alert message text |
type | string | Alert type: error, success, warning, info |
Text Representation
When an MCP agent reads a snapshot, it sees a text representation optimized for LLMs:
URL: https://example.com/login
TITLE: Sign In -- Example
CONTENT:
Welcome back. Sign in to continue.
INPUTS:
[email-field] Email address (text, empty)
[password-field] Password (password, empty)
[remember-me] Remember me (checkbox, unchecked)
ACTIONS:
[sign-in-btn] Sign In (button, primary)
[forgot-password] Forgot password? (link -> /reset)
[create-account] Create account (link -> /signup)
[google-login] Continue with Google (button)
NAVIGATION:
Home | Products | Pricing | Blog | Docs
FORMS:
[login-form] fields: email-field, password-field, remember-me -> submit: sign-in-btn
ALERTS:
(none)
This text format is compact enough for any LLM context window while retaining all the information needed to decide on the next action.
Using Snapshots Effectively
Check the snapshot after every action
Every action returns a fresh snapshot. Use it to verify the action worked and decide what to do next. For example, after a login submit, check if the url changed to a dashboard or if alerts contains an error message.
Reference elements by their labels
When calling click, fill, or submit, describe elements using the label text from the snapshot. For example, if the snapshot shows [sign-in-btn] Sign In (button, primary), use "the Sign In button" as your target.
Use forms to understand page structure
The forms array shows which fields belong together and which button submits them. This is especially useful on pages with multiple forms.
Watch for alerts
After submitting a form, check the alerts array for validation errors, success messages, or warnings. This tells your agent whether the action succeeded without needing to parse page content.