Architecture
How BabelWrap translates natural language into browser interactions and returns structured snapshots.
System Overview
BabelWrap sits between your AI agent and the target website. It provides two interfaces (REST API and MCP server) that both feed into the same session manager, which coordinates the Playwright browser engine and the LLM-powered snapshot proxy.
Components
BabelWrap Proxy (Snapshot Renderer)
Converts any webpage's DOM into a structured, LLM-readable snapshot. It auto-scrolls to trigger lazy-loaded content, waits for loading indicators to disappear, and then extracts all inputs, buttons, links, forms, tables, alerts, and navigation elements into a clean JSON structure. The snapshot is designed so an LLM can immediately understand the page without parsing HTML.
Playwright Engine
Executes real browser interactions headlessly using Chromium. Handles navigation, clicks, form fills, file uploads, keyboard input, and screenshots. A shared browser pool means individual sessions are lightweight — no full Chromium instance per session.
Session Manager
Maintains isolated browser contexts, each with its own cookies, localStorage, and page state. Sessions persist across actions and expire automatically after 1 hour of inactivity. Cookie injection and extraction enable authentication persistence across sessions.
Element Resolver
The core innovation that maps natural language descriptions to DOM elements. Uses a three-tier approach:
- Direct match — exact ID or label match (fastest, no LLM call)
- Cache hit — Redis-cached mapping from (snapshot_hash, target) pair
- LLM resolution — Claude Haiku matches the description to the most likely element, with confidence scoring and ambiguity detection
Caching means repeated interactions with the same page structure avoid LLM calls entirely, keeping costs low and latency minimal.
Site Mapper Pipeline
Orchestrates the full site mapping flow: an AI explorer agent discovers the site structure, a validator tests the generated recipes, a tool generator converts them into typed MCP tools, and the results are persisted to PostgreSQL. On every server restart, all ready site models are reloaded and their tools re-registered on the MCP server.
Explorer Agent
Uses the agno framework with Claude Sonnet 4 to intelligently browse a target website. The agent has 11 tools: 6 BabelWrap browsing tools (create session, navigate, click, fill, extract, read page) and 5 reporting tools (report page type, report entity, report recipe, trace flow, mark complete). It visits ~10-20 pages to discover entities, page types, and action flows, then records step-by-step recipes.
Recipe Executor
Executes stored recipes by replaying steps against live websites. On the happy path, steps execute as blind replay with parameter interpolation (no LLM involved). On failure, triggers self-healing: snapshots the current page, asks an LLM to suggest a corrected target, retries, and persists the fix. Includes 4-layer authentication support: cookie injection, stored credentials, login redirect detection, and expired cookie refresh.
Tool Generator
Converts validated recipes into typed Python functions with domain-prefixed names (e.g., linkedin_search_jobs). Functions are dynamically compiled with explicit typed signatures so MCP clients (Claude Desktop, Cursor, etc.) can discover and call them with proper parameter names and types.
Request Flow
Here's what happens when your agent sends a click action:
- Agent sends
POST /v1/sessions/{id}/clickwith{"target": "the Sign In button"} - API authenticates the request and looks up the session
- Session Manager retrieves the active browser context
- BabelWrap Proxy takes a snapshot of the current page
- Element Resolver maps "the Sign In button" to a specific DOM element (via cache or LLM)
- Playwright Engine clicks the resolved element
- BabelWrap Proxy takes a new snapshot of the page after the click
- API returns the snapshot to the agent with action metadata (duration, success status)
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.12+ |
| API Framework | FastAPI (async) |
| Browser Engine | Playwright (Chromium) |
| LLM Client | Anthropic SDK (Claude Haiku) |
| MCP Server | FastMCP |
| Database | PostgreSQL (asyncpg + SQLAlchemy) |
| Cache / State | Redis |
| Billing | Stripe (usage-based metering) |
| Explorer Agent | agno + Claude Sonnet 4 |
| Containerization | Docker |