Stagehand v3 + Browserbase: Production Browser Agents

The definitive guide to building production browser agents with Stagehand v3 and Browserbase. From the four primitives (act, extract, observe, agent) and the direct CDP rewrite to local Chromium dev loops, Browserbase production deployment with session replay and captcha solving, Claude Agent SDK integration, MCP server setup, anti-bot strategies, benchmarks vs Puppeteer/Playwright, and migration paths.

By Jose Nobile | 2026-04-20 | 18 min read

Why Selector-Based Automation Breaks at Scale

Traditional browser automation with Puppeteer and Playwright relies on CSS selectors and XPath expressions to locate elements. This works well for static pages, but modern web applications change their DOM structure frequently -- A/B tests swap components, framework upgrades rename class names, dynamic rendering produces different markup per session, and shadow DOM boundaries hide elements from outer queries. Over 30 days on live production sites, Playwright and Puppeteer scripts require 15-25% of their selectors to be updated just to maintain baseline functionality.

The brittleness compounds at scale. A team maintaining 500 E2E tests against a rapidly evolving SPA will spend more time fixing selectors than writing new automation. Worse, selector failures are silent -- a test passes because the element was not found and the assertion defaulted to a no-op, not because the feature works. AI-powered automation replaces fragile selectors with natural language instructions that adapt to layout changes. Instead of page.click('#submit-btn-v3'), you write page.act('click the submit button'). The AI interprets the current DOM and finds the correct element regardless of its CSS class, data attribute, or position in the tree.

Stagehand occupies the pragmatic middle ground between fully manual selector-based automation and opaque full-agent solutions. It gives you deterministic code where you want it and AI-powered flexibility where the page is unpredictable, which is exactly the balance production workflows require. You keep control over navigation and setup, and let the AI handle the parts of the page that change.

The Four Stagehand Primitives

ACT

act()

Perform browser actions from a plain-English instruction: click, fill, navigate, scroll, hover, select. The AI interprets the current page and executes the action on the correct element. Use act() for single, deterministic interactions where you want precise control. Example: page.act('click the submit button') or page.act('fill the email field with test@example.com').

EXTRACT

extract()

Pull structured data from any page using Zod schema validation. Describe what you want and define the output shape -- Stagehand reads the DOM, identifies the relevant content, and returns it as a typed object. Use extract() for scraping product listings, reading table data, or capturing form state. Example: page.extract({ schema: z.object({ price: z.string(), title: z.string() }) }).

OBSERVE

observe()

Surfaces what is actionable on a page before you commit to an action. Returns a list of interactive elements with their descriptions and possible interactions. Use observe() for dynamic pages where you need to understand what the page offers before deciding what to do. This is the reconnaissance step -- observe first, then act based on what you find.

AGENT

agent()

Runs multi-step workflows autonomously when you need end-to-end execution. Give the agent a high-level goal -- "navigate to the pricing page, extract the enterprise plan price, and take a screenshot" -- and it chains act, extract, and observe calls internally to accomplish the task. Use agent() for complex flows like checkout processes, form wizards, and multi-page navigation. Supports Claude and OpenAI models.

When to Use Each Primitive

Use act(), extract(), and observe() when you want precise, step-by-step control over browser actions -- they are deterministic and predictable. Use agent() when you need multi-step workflows executed autonomously, like navigating a complex checkout flow or filling a multi-page form. Most production teams combine both approaches: agent() for exploration and discovery, individual primitives for critical paths where reliability matters most.

The v3 Rewrite: Direct CDP, 44% Faster, MIT License

Stagehand v3, released in October 2025, is a ground-up rewrite that removes the Playwright dependency entirely. Instead of routing commands through Playwright's protocol layer, v3 talks directly to the browser using the Chrome DevTools Protocol (CDP). This eliminates one entire network hop per command and minimizes round-trip time (RTT), resulting in 44%+ faster execution across all scenarios. The performance improvement is especially dramatic for deeply nested iframes and shadow DOM interactions -- some of the hardest surfaces in modern web automation.

The new modular driver system allows Stagehand to work seamlessly with Puppeteer or any driver built on CDP. This means you can drop Stagehand into existing Puppeteer projects without rewriting your infrastructure. The architecture is AI-native from the ground up: v3 automatically caches discovered elements and actions that you can reuse without additional LLM inference cost or latency. Once the AI identifies a button's location, subsequent clicks on that button skip the LLM call entirely.

Stagehand is fully open source under the MIT license, making it safe for commercial use without legal friction. The multi-language expansion is significant: beyond the original TypeScript SDK, Stagehand is now available in Python, Go, Java, Ruby, and Rust. The Python SDK has full v3 feature parity, making it a first-class citizen for data science and scraping workflows.

v3 also supports all major model providers via the Vercel AI SDK -- Anthropic Claude, OpenAI GPT-4o, and Google Gemini. You choose which model powers your browser agent based on your use case. Claude excels at high-level reasoning and dynamic decision-making, while GPT-4o and GPT-4o mini are well-suited for executing specific, targeted browser actions.

Local Chromium Dev Loop Setup

Start development locally before connecting to Browserbase for production. The local setup runs a real Chromium instance on your machine with full CDP access, giving you instant feedback during development without cloud latency or costs.

# Install Stagehand (TypeScript)
npm install @browserbasehq/stagehand

# Install Stagehand (Python)
pip install stagehand-py

A minimal local script that demonstrates all four primitives.

// local-dev.ts -- Stagehand local Chromium dev loop
import Stagehand from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({
  env: "LOCAL",            // Run local Chromium, no cloud
  modelName: "claude-sonnet-4-20250514",
  modelClientOptions: {
    apiKey: process.env.ANTHROPIC_API_KEY
  }
});

await stagehand.init();
const page = stagehand.page;

// Navigate with standard CDP -- fast, deterministic
await page.goto("https://news.ycombinator.com");

// observe() -- what can we do on this page?
const actions = await page.observe(
  "What are the interactive elements on this page?"
);
console.log("Available actions:", actions);

// act() -- click using natural language
await page.act("Click on the first story link");

// extract() -- pull structured data
const data = await page.extract({
  instruction: "Extract the article title and all comments",
  schema: z.object({
    title: z.string(),
    comments: z.array(z.string())
  })
});
console.log("Extracted:", data);

// agent() -- autonomous multi-step workflow
await page.agent(
  "Go back to the homepage, find the highest-scored post, " +
  "click into it, and extract the submission URL"
);

await stagehand.close();

For Python, the equivalent setup is just as straightforward.

# local_dev.py -- Stagehand local dev loop (Python)
import asyncio
from stagehand import Stagehand, StagehandConfig

async def main():
    config = StagehandConfig(
        env="LOCAL",
        model_name="claude-sonnet-4-20250514",
        model_client_options={
            "api_key": os.environ["ANTHROPIC_API_KEY"]
        }
    )
    stagehand = Stagehand(config)
    await stagehand.init()
    page = stagehand.page

    await page.goto("https://example.com")
    await page.act("Click the 'More information' link")

    result = await page.extract(
        instruction="Extract the page title and first paragraph",
        schema={"title": "string", "body": "string"}
    )
    print(result)

    await stagehand.close()

asyncio.run(main())

Run your local scripts with ANTHROPIC_API_KEY=sk-... npx ts-node local-dev.ts. The browser launches visibly by default in LOCAL mode, so you can watch the AI interact with the page in real time. Set headless: true in the config to suppress the browser window for CI environments.

Connecting Browserbase for Production

Browserbase provides the production infrastructure that turns your local Stagehand scripts into scalable, reliable browser agents. Switching from local to Browserbase requires changing one configuration line -- set env: "BROWSERBASE" -- and providing your API key and project ID. Browserbase gives you headless browsers with built-in features that would take months to build yourself.

// production.ts -- Stagehand with Browserbase
import Stagehand from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "BROWSERBASE",      // Switch from LOCAL to cloud
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  modelName: "claude-sonnet-4-20250514",
  modelClientOptions: {
    apiKey: process.env.ANTHROPIC_API_KEY
  }
});

await stagehand.init();
// Everything else stays exactly the same
const page = stagehand.page;
await page.goto("https://target-site.com");
await page.act("Accept the cookie banner");
const data = await page.extract({
  instruction: "Extract all product prices and names",
  schema: z.object({
    products: z.array(z.object({
      name: z.string(),
      price: z.string()
    }))
  })
});
await stagehand.close();

Browserbase Production Features

DEBUG

Session Replay

Every browser session is recorded with full DOM snapshots, network requests, console logs, and screenshots. When an automation fails in production, replay the exact session to see what the agent saw, what actions it took, and where it went wrong. Session Inspector provides command-level logging for debugging individual steps.

SOLVE

CAPTCHA Solving

CAPTCHA solving is enabled by default for all Browserbase sessions. Through direct partnerships with CAPTCHA providers, challenges are resolved automatically in 5-30 seconds so your sessions continue without interruption. No third-party CAPTCHA API subscriptions needed.

NET

Residential Proxies

Configure residential proxies for geo-specific automation with rotating proxy pools for large-scale operations. Built-in IP rotation and management prevents rate limiting and ensures consistent access. Included in all paid plans with per-GB allocation.

SCALE

Concurrent Sessions

Run multiple browser sessions simultaneously for parallel scraping, testing, and automation workflows. Plans range from 3 concurrent browsers (Starter) to 50+ (Pro) to unlimited (Enterprise). Session orchestration handles queuing and lifecycle management automatically.

VIEW

Live View

Embed live browser sessions into your own applications. Watch agents navigate in real time, useful for demos, customer-facing automation, and debugging. The Live View API provides embeddable iframes with full interactivity.

CACHE

Action Caching

Stagehand v3 caches discovered element locations and action mappings. On repeat visits to the same page structure, cached actions execute without an LLM call, dramatically reducing latency and cost. This is automatic -- no cache configuration needed.

Pricing Overview

Browserbase offers a free tier with 1 browser hour for experimentation. The Developer plan at $20/month includes 100 browser hours and 3 concurrent browsers, with $0.12/hr overage. The Startup plan at $99/month provides ~500 hours, 5 GB of proxies, and 50 concurrent browsers, with $0.10/hr overage. New APIs include Search API ($7/1K requests) and Fetch API ($1/1K requests). Enterprise plans offer custom capacity with SLA guarantees, SSO, and dedicated support.

Combining Stagehand with Claude Agent SDK

The Claude Agent SDK provides the orchestration layer for building production-grade AI agents. Stagehand slots in as the browser tool within a Claude Agent SDK workflow, giving your agent the ability to browse the web, fill forms, extract data, and interact with any website. The combination is powerful: Claude handles reasoning and decision-making while Stagehand handles browser execution with sub-second latency.

// agent-with-browser.ts -- Claude Agent SDK + Stagehand
import Anthropic from "@anthropic-ai/sdk";
import Stagehand from "@browserbasehq/stagehand";

// Initialize Stagehand as the browser tool
const stagehand = new Stagehand({
  env: "BROWSERBASE",
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  modelName: "claude-sonnet-4-20250514",
  modelClientOptions: {
    apiKey: process.env.ANTHROPIC_API_KEY
  }
});
await stagehand.init();

// Define a tool that the Claude agent can invoke
const browserTool = {
  name: "browse_web",
  description: "Navigate to a URL and interact with the page",
  input_schema: {
    type: "object",
    properties: {
      url: { type: "string", description: "URL to navigate to" },
      action: { type: "string", description: "Action to perform" },
      extract_schema: {
        type: "object",
        description: "Optional: schema for data extraction"
      }
    },
    required: ["url", "action"]
  }
};

// The agent loop: Claude reasons, Stagehand executes
const client = new Anthropic();
const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  tools: [browserTool],
  messages: [{
    role: "user",
    content: "Research the pricing for Vercel, Netlify, " +
             "and Cloudflare Pages. Compare them in a table."
  }]
});

// Handle tool calls from Claude
for (const block of response.content) {
  if (block.type === "tool_use" && block.name === "browse_web") {
    const page = stagehand.page;
    await page.goto(block.input.url);
    await page.act(block.input.action);
    if (block.input.extract_schema) {
      const data = await page.extract({
        instruction: "Extract the requested data",
        schema: block.input.extract_schema
      });
      // Feed results back to Claude for reasoning
    }
  }
}
await stagehand.close();

This pattern enables use cases that neither tool can achieve alone. Claude reasons about what information it needs, decides which websites to visit, formulates natural-language instructions for Stagehand, and synthesizes the extracted data into a coherent response. Stagehand handles the messy reality of web pages -- dynamic content, pop-ups, cookie banners, lazy loading -- while Claude handles the high-level reasoning.

For teams already using the Claude Agent SDK, Stagehand replaces the need for custom browser tool implementations. Instead of maintaining a bespoke Puppeteer wrapper with manual selector management, you get AI-powered browser interaction that adapts to page changes automatically. The integration works with both local Chromium (for development) and Browserbase (for production), so your agent code stays identical across environments.

Stagehand MCP Server Integration

The Browserbase MCP server exposes Stagehand's browser automation capabilities as MCP tools, allowing any MCP-compatible AI client (Claude Code, Claude Desktop, VS Code) to browse the web, interact with pages, extract data, and take screenshots. This is the fastest path from zero to browser-capable AI agent.

// .mcp.json -- Browserbase MCP server (Streamable HTTP)
{
  "mcpServers": {
    "browserbase": {
      "type": "streamable-http",
      "url": "https://mcp.browserbase.com",
      "headers": {
        "Authorization": "Bearer ${BROWSERBASE_API_KEY}",
        "X-BB-Project-Id": "${BROWSERBASE_PROJECT_ID}"
      }
    }
  }
}

// Alternative: local STDIO mode
{
  "mcpServers": {
    "browserbase": {
      "command": "npx",
      "args": ["-y", "@browserbasehq/mcp-server-browserbase"],
      "env": {
        "BROWSERBASE_API_KEY": "${BROWSERBASE_API_KEY}",
        "BROWSERBASE_PROJECT_ID": "${BROWSERBASE_PROJECT_ID}"
      }
    }
  }
}

Once configured, your AI client gains access to browser tools: navigate to URLs, interact with page elements via natural language, extract structured content, take screenshots, and manage multiple concurrent sessions. The MCP server handles session lifecycle, proxy configuration, CAPTCHA solving, and session recording transparently.

There is also a fully local MCP server option that runs Stagehand with a local Chromium instance, requiring no Browserbase account. This is ideal for development, testing, and environments where cloud browser access is restricted. Install via npx @browserbasehq/mcp-server-browserbase --local and configure it in your MCP settings with the LOCAL environment flag.

The MCP approach is particularly powerful for Claude Code workflows. Instead of writing a custom Stagehand script, you simply ask Claude Code to "open this URL and extract the pricing table" -- it routes the request through the Browserbase MCP server, Stagehand executes the extraction, and the results flow back into your conversation context for analysis.

Anti-Bot and Identity Strategies

Production browser agents face aggressive bot detection from services like Cloudflare Bot Management, DataDome, PerimeterX, and Akamai Bot Manager. Browserbase solves this through a fundamentally different approach: instead of trying to evade detection by spoofing fingerprints, Browserbase browsers are verified as legitimate through direct partnerships with leading bot protection providers. Verified browsers are recognized by the protection systems themselves, yielding higher success rates and fewer interruptions than stealth plugins.

Browserbase's Agent Identity system includes automatic fingerprint rotation (canvas, WebGL, AudioContext, navigator properties), residential proxy rotation with geo-targeting, human-like browsing patterns (mouse movements, scroll velocity, typing cadence), and TLS fingerprint management. These features are enabled by default for all sessions -- no configuration needed. For sites that require additional stealth, you can configure custom proxy chains, user-agent rotation policies, and viewport randomization.

For self-hosted setups, consider these strategies: use puppeteer-extra-plugin-stealth with Stagehand's Puppeteer driver for basic evasion, rotate user agents from a curated list matching real browser distributions, add realistic delays between actions (Stagehand's AI naturally introduces human-like timing), and avoid headless-specific tells like the navigator.webdriver property. However, for high-volume or high-stakes automation, Browserbase's verified browser approach is significantly more reliable than client-side stealth alone.

Cost and Reliability Benchmarks

44% faster

Stagehand v3 vs v2 across all scenarios. The direct CDP architecture eliminates Playwright's protocol overhead, with the biggest gains on iframe and shadow DOM interactions.

~75% task completion

Stagehand agent with Claude Sonnet on the WebVoyager benchmark, compared to ~72% for Browser Use with GPT-4.1 and ~78% for Browser Use with Claude Opus. Hand-written Playwright scripts achieve ~98% but require hours of development per task.

<5% maintenance

Over 30 days on live production sites, Stagehand scripts required less than 5% prompt adjustments, compared to 15-25% selector fixes for Playwright/Puppeteer scripts. AI-powered automation is dramatically more resilient to UI changes.

$0.001-0.01 per action

Typical LLM cost per Stagehand action using Claude Sonnet. Action caching in v3 reduces this further -- repeated actions on known page structures skip the LLM entirely, making cost proportional to page diversity, not action volume.

Browserbase from $39/mo

200 browser hours with CAPTCHA solving, residential proxies, session replay, and 3 concurrent sessions. Compare to building and maintaining your own headless browser infrastructure, proxy rotation, and CAPTCHA solving integrations.

Free tier available

Browserbase offers 1 browser hour free with 7-day data retention. Stagehand itself is MIT-licensed and free. The only required cost is LLM API usage (Anthropic, OpenAI, or Gemini).

Migration from Puppeteer

Stagehand v3 was designed for incremental adoption. Since it supports the Puppeteer driver natively, you can introduce Stagehand into an existing Puppeteer project without replacing your entire automation stack. Start by wrapping your Puppeteer browser instance with Stagehand, then gradually replace brittle selectors with AI-powered actions.

// Step 1: Existing Puppeteer script
import puppeteer from "puppeteer";
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
// Fragile selector -- breaks when UI changes
await page.click("#login-form .submit-btn.primary");
await page.type("#email-input", "user@example.com");

// Step 2: Add Stagehand alongside Puppeteer
import Stagehand from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "LOCAL",
  modelName: "claude-sonnet-4-20250514",
  modelClientOptions: {
    apiKey: process.env.ANTHROPIC_API_KEY
  }
});
await stagehand.init();
const page = stagehand.page;

await page.goto("https://example.com");

// Replace brittle selectors with natural language
await page.act("Fill the email field with user@example.com");
await page.act("Click the submit button");

// Mix and match: use Puppeteer for fast, known actions
await page.evaluate(() => window.scrollTo(0, 0));

// Use Stagehand for dynamic or unpredictable elements
const data = await page.extract({
  instruction: "Extract the dashboard summary metrics",
  schema: z.object({
    totalUsers: z.string(),
    revenue: z.string(),
    growth: z.string()
  })
});

await stagehand.close();

The migration strategy is simple: keep deterministic actions as code, replace fragile selectors with AI. URL navigation, cookie setting, viewport configuration, and network interception stay as Puppeteer/CDP calls. Element interaction, data extraction, and form filling migrate to Stagehand primitives. This hybrid approach gives you the reliability of code where possible and the flexibility of AI where needed.

Key API mappings for migration: page.click(selector) becomes page.act('click the [description]'), page.type(selector, text) becomes page.act('type [text] into the [description]'), page.$(selector) with manual extraction becomes page.extract({ instruction, schema }), and page.$$(selector) for element enumeration becomes page.observe('list all [element type]'). The Stagehand versions are more verbose in code but dramatically more resilient to UI changes.

Context Builder and Self-Healing Execution

Stagehand v3.2 (April 2026) introduces the Context Builder, a pre-processing layer that reduces LLM token consumption by up to 60% per action. Instead of sending the entire page DOM to the model for every act() or extract() call, the Context Builder performs local heuristic analysis to identify the relevant DOM subtree, strips non-visible elements, collapses repetitive structures, and compresses attribute data. The resulting context payload is typically 3-5x smaller than the full DOM, cutting both latency and cost per operation.

Self-Healing Execution handles DOM shifts that occur between observation and action. When Stagehand identifies an element via the LLM but the DOM mutates before the CDP command executes (due to lazy loading, animations, or React re-renders), the self-healing system automatically re-scans the affected subtree, re-identifies the target element, and retries the action without requiring a full LLM round-trip. This reduces mid-action failures by roughly 40% on dynamic SPAs.

Both features leverage the driver-agnostic design from the v3 rewrite. Context Builder and Self-Healing work identically whether you use the built-in CDP driver, the Puppeteer driver, or a custom driver implementation. This ensures consistent behavior across local development and Browserbase production environments.

Related Technologies