Puppeteer: Browser Automation, Testing, and Web Scraping

The complete guide to controlling headless Chrome programmatically. From simple screenshots to complex E2E testing, performance auditing with Lighthouse, and production-grade web scraping. Real-world patterns from automating 348 screenshot exports and CI/CD integration.

PuppeteerHeadless ChromeCDPLighthouseJestpuppeteer-extraDockerGitHub ActionsPlaywright

1. What Is Puppeteer?

Puppeteer is a Node.js library developed by the Chrome DevTools team that provides a high-level API to control Chrome or Chromium browsers programmatically. It launches a browser instance (headless by default, but optionally with a visible UI), and gives you complete control: navigate to URLs, click buttons, fill forms, take screenshots, generate PDFs, intercept network requests, and execute JavaScript in the page context.

Puppeteer communicates with the browser via the Chrome DevTools Protocol (CDP), the same protocol used by Chrome DevTools. This means anything you can do manually in a browser, Puppeteer can automate programmatically. It is the de facto standard for headless browser automation in the Node.js ecosystem, used by millions of developers for testing, scraping, screenshot generation, and workflow automation.

Key advantages over alternatives: Puppeteer ships with a bundled Chromium binary (zero-config setup), supports the latest Chrome features immediately, provides native CDP access for advanced debugging, and integrates with Lighthouse for performance auditing. It works on all major platforms (macOS, Linux, Windows), runs in Docker containers for CI/CD, and has an active community with extensive documentation. Current version is v24.26.0 (April 2026). Since v23, WebDriver BiDi is production-ready and is now the default protocol for Firefox connections. CDP remains the default for Chrome.

2. Key Features

CORE

Headless Chrome Control

Launch Chrome in headless mode (no visible UI) or headed mode (for debugging). Control browser lifecycle, create multiple pages/tabs, manage cookies and sessions, set viewport sizes, emulate devices, and configure proxy settings.

CORE

Page Navigation and Interaction

Navigate to URLs with configurable wait conditions (load, DOMContentLoaded, networkIdle). Click elements, type text, select options, drag-and-drop, hover, and handle keyboard/mouse events. Wait for elements, selectors, or custom conditions before proceeding.

CAPTURE

Screenshots and PDF Generation

Capture full-page screenshots, element-level screenshots, or viewport screenshots in PNG/JPEG/WebP. Generate PDFs with custom page sizes, margins, headers, and footers. Supports clip regions, quality settings, and transparent backgrounds.

FORM

Form Filling and Submission

Type into input fields with realistic keystroke timing, select dropdown options, check/uncheck checkboxes, upload files, and submit forms. Handle multi-step forms, CAPTCHA workarounds, and dynamic form validation that requires JavaScript execution.

NET

Network Interception

Intercept, modify, or block network requests and responses. Mock API endpoints for testing, block ads and trackers for clean scraping, modify request headers for authentication, and log all network activity for debugging. Supports request/response modification in-flight.

TEST

End-to-End Testing

Build comprehensive E2E test suites that test your application through a real browser. Combine with Jest, Mocha, or any test framework. Assert on page content, visual state, network requests, console output, and JavaScript errors. Visual regression testing with screenshot comparison.

PERF

Performance Testing (Lighthouse)

Integrate with Lighthouse for automated performance auditing. Measure Core Web Vitals (LCP, INP, CLS), accessibility scores, SEO compliance, and best practices. Run audits in CI/CD pipelines to catch performance regressions before deployment.

SCRAPE

Web Scraping

Extract data from JavaScript-rendered pages that static HTTP clients cannot handle. Handle infinite scroll, lazy loading, pagination, and dynamic content. Respect robots.txt, implement rate limiting, rotate user agents, and manage IP rotation for responsible scraping.

CORE

JavaScript Execution

Execute arbitrary JavaScript in the page context. Access and modify the DOM, call page functions, extract data from JavaScript variables, inject scripts, and interact with client-side frameworks (React state, Vue reactivity, Angular services).

ENV

Device and Network Emulation

Emulate mobile devices (iPhone, Pixel, iPad), set custom viewports, simulate slow networks (3G, offline), set geolocation, adjust timezone, and configure color scheme preferences (dark mode testing). Built-in device descriptors for 50+ devices.

ENV

Docker and CI/CD Integration

Run Puppeteer in Docker containers with pre-configured Chromium. Official Docker images available. Integrate with GitHub Actions, GitLab CI, Jenkins, and CircleCI. Headless mode eliminates display server requirements in CI environments.

CAPTURE

Tracing and Coverage

Record Chrome traces for performance analysis (view in chrome://tracing). Measure JavaScript and CSS code coverage to identify unused code. Capture HAR files for network analysis. Built-in CDP integration for advanced profiling.

3. How I Use It

My most intensive Puppeteer project was the Xiaomi app screenshot export challenge. The Xiaomi Health app stores body composition reports that cannot be exported through any built-in feature. I built a Puppeteer automation that navigated the web version of the Xiaomi ecosystem, authenticated, scrolled through 348 individual reports, captured high-resolution screenshots of each one, and organized them by date. What would have taken days of manual screenshotting was completed in under 2 hours of automated execution.

I integrate Puppeteer with Lighthouse in CI/CD pipelines for performance regression testing. Every pull request triggers a Lighthouse audit that measures Core Web Vitals, accessibility scores, and SEO compliance. If any metric drops below the configured threshold, the pipeline fails and the PR cannot merge. This has prevented dozens of performance regressions from reaching production across multiple projects.

For the health dashboard on this website, I use Puppeteer for automated visual validation. A scheduled job navigates to the dashboard, captures screenshots of all chart views, compares them against baseline images, and alerts me if any visual change is detected. This catches CSS regressions, data rendering bugs, and broken chart configurations that unit tests would miss.

Puppeteer also powers the browser automation capability in my OpenClaw deployment. When my AI assistant needs to interact with web applications -- checking flight prices, filling expense reports, downloading invoices -- it delegates to a Puppeteer instance that executes the interaction and returns results. This gives my AI assistant eyes and hands on the web.

Jose's Experience: 348 Xiaomi body composition reports exported automatically, Lighthouse CI enforcing performance budgets on every PR, and visual regression testing protecting the health dashboard from silent breakage.

4. Getting Started

Puppeteer installs with a single npm command and bundles its own Chromium binary, so there is zero external configuration required.

# Install Puppeteer (downloads Chromium automatically)
npm install puppeteer

# Or install without bundled Chromium (use system Chrome)
npm install puppeteer-core

# Verify installation
node -e "import('puppeteer').then(p => p.default.launch().then(b => { console.log('OK'); b.close(); }))"

5. Page Navigation and Waiting Strategies

Choosing the right wait strategy is critical for reliable automation. Puppeteer provides multiple mechanisms for waiting on page load, element visibility, and custom conditions. Using the wrong strategy leads to flaky scripts that fail intermittently.

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Strategy 1: Wait for specific load events
await page.goto('https://example.com', {
  waitUntil: 'networkidle0'  // Wait until 0 network connections for 500ms
});

// Strategy 2: Wait for a specific element to appear
await page.goto('https://example.com/dashboard');
await page.waitForSelector('.chart-container', {
  visible: true,
  timeout: 10000
});

// Strategy 3: Wait for a function to return true
await page.waitForFunction(
  () => document.querySelectorAll('.data-row').length > 10,
  { timeout: 15000, polling: 500 }
);

// Strategy 4: Wait for navigation after a click
await Promise.all([
  page.waitForNavigation({ waitUntil: 'networkidle0' }),
  page.click('#next-page')
]);

// Strategy 5: Wait for a network response
const response = await page.waitForResponse(
  res => res.url().includes('/api/data') && res.status() === 200
);
const data = await response.json();

// Strategy 6: Custom wait with timeout
async function waitForCondition(page, fn, timeout = 10000) {
  const start = Date.now();
  while (Date.now() - start < timeout) {
    const result = await page.evaluate(fn);
    if (result) return result;
    await new Promise(r => setTimeout(r, 200));
  }
  throw new Error('Condition not met within timeout');
}

await browser.close();

6. Screenshots and PDF Generation

Puppeteer supports full-page screenshots, element-level captures, clip regions, and multiple output formats. PDF generation includes custom page sizes, margins, headers, footers, and print-specific CSS.

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('https://example.com', { waitUntil: 'networkidle0' });

// Full-page screenshot (captures entire scrollable area)
await page.screenshot({
  path: 'full-page.png',
  fullPage: true
});

// Viewport screenshot (only visible area)
await page.screenshot({
  path: 'viewport.png',
  fullPage: false
});

// Element-level screenshot
const element = await page.$('.hero-section');
await element.screenshot({ path: 'hero.png' });

// Screenshot with clip region and quality settings
await page.screenshot({
  path: 'clipped.jpeg',
  type: 'jpeg',
  quality: 85,
  clip: { x: 0, y: 0, width: 800, height: 600 }
});

// WebP format with transparent background
await page.screenshot({
  path: 'transparent.webp',
  type: 'webp',
  omitBackground: true
});

// PDF with custom layout
await page.pdf({
  path: 'document.pdf',
  format: 'A4',
  printBackground: true,
  margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' },
  displayHeaderFooter: true,
  headerTemplate: '<div style="font-size:10px;text-align:center;width:100%">Report</div>',
  footerTemplate: '<div style="font-size:10px;text-align:center;width:100%">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>'
});

// Batch screenshots (used for my 348 Xiaomi exports)
const reportUrls = getReportUrls(); // array of URLs
for (const [i, url] of reportUrls.entries()) {
  await page.goto(url, { waitUntil: 'networkidle0' });
  await page.screenshot({
    path: `reports/report-${String(i).padStart(4, '0')}.png`,
    fullPage: true
  });
}

await browser.close();

7. Form Filling and Interaction

Puppeteer can simulate realistic user interactions including typing with keystroke delays, selecting dropdown options, uploading files, and handling multi-step forms with dynamic validation.

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com/login');

// Type with realistic delay between keystrokes
await page.type('#email', 'user@example.com', { delay: 50 });
await page.type('#password', 'secure-password', { delay: 50 });

// Click submit and wait for navigation
await Promise.all([
  page.waitForNavigation({ waitUntil: 'networkidle0' }),
  page.click('#submit-btn')
]);

// Select dropdown option
await page.select('#country', 'CO');

// Check/uncheck checkboxes
await page.click('#terms-checkbox');

// Upload a file
const fileInput = await page.$('input[type="file"]');
await fileInput.uploadFile('/path/to/document.pdf');

// Clear an input field before typing
await page.click('#search', { clickCount: 3 });
await page.type('#search', 'new search term');

// Handle a multi-step form
await page.type('#step1-name', 'Jose Nobile');
await page.click('#next-step');
await page.waitForSelector('#step2-address', { visible: true });
await page.type('#step2-address', '123 Main St');
await page.click('#next-step');
await page.waitForSelector('#step3-confirm', { visible: true });
await page.click('#submit-final');

// Verify success
const welcomeText = await page.$eval(
  '.welcome-message',
  el => el.textContent
);
console.log('Logged in:', welcomeText);

await browser.close();

8. Network Interception (Request and Response)

Intercept and modify network requests to mock APIs, block unnecessary resources, inject authentication headers, or log all traffic. Puppeteer also supports response interception for modifying data before the page receives it. This is essential for testing (mocking backends), scraping (blocking ads/trackers), and debugging (inspecting API calls).

// Request interception: block, modify, or log
await page.setRequestInterception(true);

page.on('request', (request) => {
  // Block images and stylesheets for faster scraping
  if (['image', 'stylesheet', 'font'].includes(
    request.resourceType()
  )) {
    request.abort();
    return;
  }

  // Add auth header to API requests
  if (request.url().includes('/api/')) {
    request.continue({
      headers: {
        ...request.headers(),
        'Authorization': 'Bearer ' + token
      }
    });
    return;
  }

  request.continue();
});

// Response interception: log and inspect responses
page.on('response', async (response) => {
  const url = response.url();
  if (url.includes('/api/')) {
    console.log(`${response.status()} ${url}`);
    try {
      const body = await response.json();
      console.log('Response body:', JSON.stringify(body).slice(0, 200));
    } catch {
      // Response was not JSON
    }
  }
});

// Mock an API endpoint entirely
await page.setRequestInterception(true);
page.on('request', (request) => {
  if (request.url().includes('/api/user/profile')) {
    request.respond({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({
        name: 'Test User',
        email: 'test@example.com',
        plan: 'premium'
      })
    });
    return;
  }
  request.continue();
});

// Capture all network requests as a HAR-like log
const networkLog = [];
page.on('request', (req) => {
  networkLog.push({
    url: req.url(),
    method: req.method(),
    resourceType: req.resourceType(),
    timestamp: Date.now()
  });
});

page.on('response', (res) => {
  const entry = networkLog.find(e => e.url === res.url());
  if (entry) {
    entry.status = res.status();
    entry.duration = Date.now() - entry.timestamp;
  }
});

9. End-to-End Testing

Build comprehensive E2E test suites that exercise your application through a real browser. Puppeteer integrates with Jest, Mocha, or any Node.js test framework. Assert on page content, visual state, network activity, console output, and JavaScript errors.

// e2e.test.js (with Jest)
import puppeteer from 'puppeteer';

let browser, page;

beforeAll(async () => {
  browser = await puppeteer.launch();
  page = await browser.newPage();

  // Capture console errors
  page.on('console', msg => {
    if (msg.type() === 'error') {
      console.error('PAGE ERROR:', msg.text());
    }
  });

  // Capture uncaught exceptions
  page.on('pageerror', err => {
    console.error('UNCAUGHT:', err.message);
  });
});

afterAll(async () => {
  await browser.close();
});

describe('Login flow', () => {
  test('should login with valid credentials', async () => {
    await page.goto('http://localhost:3000/login');
    await page.type('#email', 'user@example.com');
    await page.type('#password', 'valid-password');

    await Promise.all([
      page.waitForNavigation(),
      page.click('#login-btn')
    ]);

    const url = page.url();
    expect(url).toContain('/dashboard');

    const welcome = await page.$eval('h1', el => el.textContent);
    expect(welcome).toContain('Welcome');
  });

  test('should show error for invalid credentials', async () => {
    await page.goto('http://localhost:3000/login');
    await page.type('#email', 'bad@example.com');
    await page.type('#password', 'wrong');
    await page.click('#login-btn');

    await page.waitForSelector('.error-message', { visible: true });
    const error = await page.$eval('.error-message', el => el.textContent);
    expect(error).toContain('Invalid');
  });

  test('should pass visual regression', async () => {
    await page.goto('http://localhost:3000/dashboard');
    const screenshot = await page.screenshot();
    expect(screenshot).toMatchImageSnapshot({
      failureThreshold: 0.01,
      failureThresholdType: 'percent'
    });
  });
});

10. Lighthouse Integration (Programmatic API)

Run Lighthouse audits programmatically within your CI/CD pipeline. Set performance budgets and fail builds that exceed them. Track metrics over time to detect gradual performance degradation. Combine with Puppeteer navigation to audit authenticated pages and complex user flows.

// lighthouse-audit.js
import puppeteer from 'puppeteer';
import lighthouse from 'lighthouse';

const browser = await puppeteer.launch({
  args: ['--remote-debugging-port=9222']
});

// Optionally navigate to authenticated pages first
const page = await browser.newPage();
await page.goto('https://mysite.com/login');
await page.type('#email', process.env.TEST_EMAIL);
await page.type('#password', process.env.TEST_PASSWORD);
await Promise.all([
  page.waitForNavigation(),
  page.click('#login-btn')
]);
await page.close();

// Run Lighthouse audit on the authenticated session
const result = await lighthouse('https://mysite.com/dashboard', {
  port: 9222,
  output: 'json',
  onlyCategories: ['performance', 'accessibility', 'seo', 'best-practices'],
  settings: {
    formFactor: 'desktop',
    screenEmulation: { disabled: true },
    throttling: {
      rttMs: 40,
      throughputKbps: 10240,
      cpuSlowdownMultiplier: 1,
    },
  },
});

const { lhr } = result;
const scores = {
  performance: lhr.categories.performance.score * 100,
  accessibility: lhr.categories.accessibility.score * 100,
  seo: lhr.categories.seo.score * 100,
  bestPractices: lhr.categories['best-practices'].score * 100,
  lcp: lhr.audits['largest-contentful-paint'].numericValue,
  cls: lhr.audits['cumulative-layout-shift'].numericValue,
  tbt: lhr.audits['total-blocking-time'].numericValue,
};

console.table(scores);

// Fail CI if below threshold
const THRESHOLD = 90;
for (const [key, value] of Object.entries(scores)) {
  if (['performance', 'accessibility', 'seo', 'bestPractices'].includes(key)
      && value < THRESHOLD) {
    console.error(`FAIL: ${key} = ${value} (threshold: ${THRESHOLD})`);
    process.exit(1);
  }
}

await browser.close();

Jose's Experience: I run Lighthouse CI on every pull request for josenobile.co, maintaining 100/100/100/100 scores. The programmatic API lets me audit authenticated pages like the health dashboard that the CLI cannot reach without a login step.

11. Web Scraping (Pagination and Infinite Scroll)

Puppeteer excels at scraping JavaScript-rendered pages that static HTTP clients cannot handle. It supports pagination, infinite scroll, lazy-loaded content, and authenticated scraping. Always respect robots.txt and implement rate limiting for responsible data extraction.

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();

// --- Pagination scraping ---
const allProducts = [];
let currentPage = 1;
const maxPages = 20;

while (currentPage <= maxPages) {
  await page.goto(`https://example.com/products?page=${currentPage}`, {
    waitUntil: 'networkidle0'
  });

  const products = await page.$$eval('.product-card', cards =>
    cards.map(card => ({
      name: card.querySelector('.name')?.textContent?.trim(),
      price: card.querySelector('.price')?.textContent?.trim(),
      url: card.querySelector('a')?.href
    }))
  );

  if (products.length === 0) break;
  allProducts.push(...products);
  currentPage++;

  // Rate limiting: wait between requests
  await new Promise(r => setTimeout(r, 1500));
}

// --- Infinite scroll scraping ---
await page.goto('https://example.com/feed', {
  waitUntil: 'networkidle0'
});

let previousHeight = 0;
let scrollAttempts = 0;
const maxScrolls = 50;

while (scrollAttempts < maxScrolls) {
  // Scroll to bottom
  await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));

  // Wait for new content to load
  await new Promise(r => setTimeout(r, 2000));
  await page.waitForFunction(
    `document.body.scrollHeight > ${previousHeight}`,
    { timeout: 5000 }
  ).catch(() => null);

  const newHeight = await page.evaluate(() => document.body.scrollHeight);
  if (newHeight === previousHeight) break;

  previousHeight = newHeight;
  scrollAttempts++;
}

// Extract all loaded items
const feedItems = await page.$$eval('.feed-item', items =>
  items.map(item => ({
    title: item.querySelector('h2')?.textContent?.trim(),
    content: item.querySelector('.body')?.textContent?.trim(),
    date: item.querySelector('time')?.getAttribute('datetime')
  }))
);

console.log(`Scraped ${feedItems.length} items`);
await browser.close();

12. Cookie and Session Management

Puppeteer provides full control over cookies and browser storage. You can save and restore sessions to avoid re-authenticating, share cookies between pages, and manage storage across browser contexts for isolation.

import puppeteer from 'puppeteer';
import { writeFileSync, readFileSync, existsSync } from 'fs';

const COOKIES_FILE = './session-cookies.json';

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Restore cookies from a previous session
if (existsSync(COOKIES_FILE)) {
  const cookies = JSON.parse(readFileSync(COOKIES_FILE, 'utf-8'));
  await page.setCookie(...cookies);
  console.log('Session restored from cookies file');
}

await page.goto('https://example.com/dashboard');

// Check if session is still valid
const isLoggedIn = await page.evaluate(
  () => !!document.querySelector('.user-profile')
);

if (!isLoggedIn) {
  // Re-authenticate
  await page.goto('https://example.com/login');
  await page.type('#email', 'user@example.com');
  await page.type('#password', 'password');
  await Promise.all([
    page.waitForNavigation(),
    page.click('#login-btn')
  ]);
}

// Save cookies for next run
const cookies = await page.cookies();
writeFileSync(COOKIES_FILE, JSON.stringify(cookies, null, 2));

// Manage localStorage and sessionStorage
await page.evaluate(() => {
  localStorage.setItem('theme', 'dark');
  localStorage.setItem('language', 'en-US');
});

// Use incognito context for isolated sessions
const context = await browser.createBrowserContext();
const privatePage = await context.newPage();
// This page has no cookies or storage from the main context
await privatePage.goto('https://example.com');
await context.close();

await browser.close();

13. Device Emulation

Puppeteer includes built-in device descriptors for 50+ devices. You can emulate mobile viewports, touch events, device pixel ratios, user agents, slow network connections, geolocation, timezone, and color scheme preferences for comprehensive cross-device testing.

import puppeteer, { KnownDevices } from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Emulate a specific device
const iPhone14 = KnownDevices['iPhone 14 Pro Max'];
await page.emulate(iPhone14);
await page.goto('https://example.com');
await page.screenshot({ path: 'iphone14.png' });

// Emulate Pixel 5
const pixel5 = KnownDevices['Pixel 5'];
await page.emulate(pixel5);
await page.goto('https://example.com');
await page.screenshot({ path: 'pixel5.png' });

// Custom viewport with device pixel ratio
await page.setViewport({
  width: 375,
  height: 812,
  deviceScaleFactor: 3,
  isMobile: true,
  hasTouch: true
});

// Simulate slow 3G network
const client = await page.createCDPSession();
await client.send('Network.emulateNetworkConditions', {
  offline: false,
  downloadThroughput: (500 * 1024) / 8,  // 500 Kbps
  uploadThroughput: (500 * 1024) / 8,
  latency: 400
});

// Set geolocation (Bogota, Colombia)
await page.setGeolocation({
  latitude: 4.7110,
  longitude: -74.0721
});

// Set timezone
await page.emulateTimezone('America/Bogota');

// Emulate dark mode
await page.emulateMediaFeatures([
  { name: 'prefers-color-scheme', value: 'dark' }
]);
await page.screenshot({ path: 'dark-mode.png' });

// Emulate light mode
await page.emulateMediaFeatures([
  { name: 'prefers-color-scheme', value: 'light' }
]);
await page.screenshot({ path: 'light-mode.png' });

await browser.close();

14. JavaScript Execution in Page Context

Puppeteer lets you execute arbitrary JavaScript in the page context, bridging the Node.js environment and the browser. Use page.evaluate() for one-off execution, page.exposeFunction() to call Node.js functions from the browser, and page.evaluateHandle() for DOM object references.

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');

// Execute JS and return a primitive value
const title = await page.evaluate(() => document.title);
const linkCount = await page.evaluate(
  () => document.querySelectorAll('a').length
);

// Pass arguments from Node.js to page context
const selector = '.article';
const articles = await page.evaluate((sel) => {
  return Array.from(document.querySelectorAll(sel)).map(el => ({
    title: el.querySelector('h2')?.textContent,
    href: el.querySelector('a')?.href
  }));
}, selector);

// Access framework state (React example)
const reactState = await page.evaluate(() => {
  const fiber = document.querySelector('#app')._reactRootContainer
    ?._internalRoot?.current;
  return fiber?.memoizedState;
});

// Expose a Node.js function to the page
await page.exposeFunction('saveToFile', async (data) => {
  const { writeFileSync } = await import('fs');
  writeFileSync('scraped-data.json', JSON.stringify(data));
});

// Call the exposed function from page context
await page.evaluate(async () => {
  const data = { timestamp: Date.now(), items: ['a', 'b', 'c'] };
  await window.saveToFile(data);
});

// Get a JSHandle for complex objects
const bodyHandle = await page.evaluateHandle(() => document.body);
const html = await page.evaluate(body => body.innerHTML, bodyHandle);
await bodyHandle.dispose();

// Inject a script into the page
await page.addScriptTag({
  content: 'window.__injected = true; console.log("Script injected");'
});
await page.addStyleTag({
  content: '.debug { outline: 2px solid red !important; }'
});

await browser.close();

15. Tracing and Coverage

Puppeteer can record Chrome DevTools traces for performance analysis and measure JavaScript/CSS code coverage to identify unused code. Traces can be viewed in chrome://tracing or Chrome DevTools Performance panel. Coverage data helps optimize bundle sizes by removing dead code.

import puppeteer from 'puppeteer';
import { writeFileSync } from 'fs';

const browser = await puppeteer.launch();
const page = await browser.newPage();

// --- Chrome Tracing ---
// Start recording a trace
await page.tracing.start({
  path: 'trace.json',
  categories: ['devtools.timeline', 'blink.user_timing', 'v8']
});

await page.goto('https://example.com', { waitUntil: 'networkidle0' });

// Perform some interactions while tracing
await page.click('.menu-toggle');
await page.waitForSelector('.menu-open', { visible: true });

// Stop tracing (writes to trace.json)
await page.tracing.stop();
// Open trace.json in chrome://tracing for analysis

// --- JavaScript Coverage ---
await page.coverage.startJSCoverage();
await page.goto('https://example.com', { waitUntil: 'networkidle0' });
const jsCoverage = await page.coverage.stopJSCoverage();

let totalBytes = 0;
let usedBytes = 0;
for (const entry of jsCoverage) {
  totalBytes += entry.text.length;
  for (const range of entry.ranges) {
    usedBytes += range.end - range.start;
  }
}
console.log(`JS coverage: ${((usedBytes / totalBytes) * 100).toFixed(1)}%`);
console.log(`Unused JS: ${((1 - usedBytes / totalBytes) * 100).toFixed(1)}%`);

// --- CSS Coverage ---
await page.coverage.startCSSCoverage();
await page.goto('https://example.com', { waitUntil: 'networkidle0' });
const cssCoverage = await page.coverage.stopCSSCoverage();

let totalCSS = 0;
let usedCSS = 0;
for (const entry of cssCoverage) {
  totalCSS += entry.text.length;
  for (const range of entry.ranges) {
    usedCSS += range.end - range.start;
  }
}
console.log(`CSS coverage: ${((usedCSS / totalCSS) * 100).toFixed(1)}%`);

// Write unused CSS report
const unusedCSS = cssCoverage.map(entry => ({
  url: entry.url,
  total: entry.text.length,
  used: entry.ranges.reduce((a, r) => a + (r.end - r.start), 0),
})).filter(e => e.used / e.total < 0.5);
writeFileSync('unused-css-report.json', JSON.stringify(unusedCSS, null, 2));

await browser.close();

16. Stealth Plugins and Anti-Detection

For scraping scenarios where sites detect headless browsers, puppeteer-extra with the stealth plugin patches common detection vectors: the navigator.webdriver flag, missing browser plugins, incorrect Chrome runtime properties, and WebGL vendor strings. Always combine stealth with ethical scraping practices.

import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';

// Apply stealth plugin (patches 10+ detection vectors)
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-blink-features=AutomationControlled'
  ]
});

const page = await browser.newPage();

// Randomize viewport to avoid fingerprinting
const viewports = [
  { width: 1920, height: 1080 },
  { width: 1366, height: 768 },
  { width: 1440, height: 900 },
  { width: 1536, height: 864 },
];
const vp = viewports[Math.floor(Math.random() * viewports.length)];
await page.setViewport(vp);

// Rotate user agents
const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
];
await page.setUserAgent(
  userAgents[Math.floor(Math.random() * userAgents.length)]
);

// Randomize timing between actions
function randomDelay(min = 500, max = 2000) {
  return new Promise(r =>
    setTimeout(r, min + Math.random() * (max - min))
  );
}

await page.goto('https://example.com');
await randomDelay();

// Simulate human-like mouse movement before clicking
await page.mouse.move(100, 200);
await randomDelay(100, 300);
await page.mouse.move(250, 350);
await randomDelay(100, 300);
await page.click('.target-element');

// Verify stealth is working
const isWebdriver = await page.evaluate(
  () => navigator.webdriver
);
console.log('webdriver detected:', isWebdriver); // should be false

await browser.close();

17. Error Handling Patterns

Robust Puppeteer scripts require comprehensive error handling. Common failure modes include navigation timeouts, missing selectors, network errors, and crashed browser instances. Implement retry logic, graceful degradation, and proper cleanup to build reliable automation.

import puppeteer from 'puppeteer';

// Retry wrapper with exponential backoff
async function withRetry(fn, maxRetries = 3, baseDelay = 1000) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      const delay = baseDelay * Math.pow(2, attempt - 1);
      console.warn(`Attempt ${attempt} failed: ${error.message}. Retrying in ${delay}ms...`);
      await new Promise(r => setTimeout(r, delay));
    }
  }
}

// Safe navigation with error handling
async function safeGoto(page, url, options = {}) {
  try {
    const response = await page.goto(url, {
      waitUntil: 'networkidle0',
      timeout: 30000,
      ...options
    });

    if (!response) {
      throw new Error(`No response from ${url}`);
    }

    if (!response.ok() && response.status() !== 304) {
      throw new Error(`HTTP ${response.status()} at ${url}`);
    }

    return response;
  } catch (error) {
    if (error.message.includes('net::ERR_')) {
      console.error(`Network error navigating to ${url}: ${error.message}`);
    } else if (error.name === 'TimeoutError') {
      console.error(`Timeout navigating to ${url}`);
    }
    throw error;
  }
}

// Safe element interaction
async function safeClick(page, selector, timeout = 5000) {
  try {
    await page.waitForSelector(selector, { visible: true, timeout });
    await page.click(selector);
  } catch (error) {
    console.error(`Failed to click ${selector}: ${error.message}`);
    // Take debug screenshot
    await page.screenshot({
      path: `debug-${Date.now()}.png`,
      fullPage: true
    });
    throw error;
  }
}

// Browser crash recovery
async function withBrowserRecovery(task) {
  let browser;
  try {
    browser = await puppeteer.launch({
      args: ['--no-sandbox', '--disable-dev-shm-usage']
    });

    browser.on('disconnected', () => {
      console.error('Browser disconnected unexpectedly');
    });

    await task(browser);
  } catch (error) {
    console.error('Task failed:', error.message);
    throw error;
  } finally {
    if (browser) {
      try {
        await browser.close();
      } catch {
        // Browser already closed or crashed
      }
    }
  }
}

// Usage example
await withBrowserRecovery(async (browser) => {
  const page = await browser.newPage();
  page.setDefaultTimeout(15000);

  await withRetry(async () => {
    await safeGoto(page, 'https://example.com');
    await safeClick(page, '#dynamic-button');
    const data = await page.$eval('.result', el => el.textContent);
    console.log('Result:', data);
  });
});

18. Docker and CI/CD Deployment

Running Puppeteer in Docker requires specific system dependencies for Chromium. The configuration below provides a production-ready Dockerfile and GitHub Actions workflow for running Puppeteer in CI/CD environments.

# Dockerfile for Puppeteer
FROM node:20-slim

# Install Chromium dependencies
RUN apt-get update && apt-get install -y \
    chromium \
    fonts-liberation \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libcups2 \
    libdbus-1-3 \
    libgbm1 \
    libnspr4 \
    libnss3 \
    libx11-xcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxrandr2 \
    xdg-utils \
    --no-install-recommends \
  && rm -rf /var/lib/apt/lists/*

# Set Puppeteer to use system Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Run as non-root user
RUN groupadd -r pptruser && useradd -r -g pptruser pptruser
USER pptruser

CMD ["node", "index.js"]

# .github/workflows/puppeteer-tests.yml
name: Puppeteer E2E Tests

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  e2e:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Start application
        run: npm start &

      - name: Wait for server
        run: npx wait-on http://localhost:3000

      - name: Run Puppeteer tests
        run: npm run test:e2e
        env:
          PUPPETEER_ARGS: '--no-sandbox --disable-setuid-sandbox'

      - name: Run Lighthouse audit
        run: node lighthouse-audit.js

      - name: Upload screenshots on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: debug-screenshots
          path: debug-*.png

19. Playwright Comparison

Playwright, created by former Puppeteer team members at Microsoft, is the main alternative. Both tools automate browsers, but they differ in architecture, API design, and cross-browser support. The choice depends on your project requirements.

Feature	Puppeteer	Playwright
Browser support	Chrome/Chromium (CDP), Firefox (BiDi default)	Chromium, Firefox, WebKit
Auto-waiting	Manual (waitForSelector, etc.)	Built-in (actions auto-wait)
Selector engine	CSS selectors, XPath	CSS, XPath, text, role, test-id
Parallel isolation	Browser contexts	Browser contexts + test fixtures
Network mocking	Request interception API	route() API with glob matching
Mobile emulation	KnownDevices descriptors	devices descriptors + projects
CDP access	Native, first-class	Supported but less direct
Lighthouse integration	Native via CDP port	Requires adapter setup
Test runner	External (Jest, Mocha)	Built-in (@playwright/test)
Languages	JavaScript/TypeScript	JS/TS, Python, Java, C#

When to choose Puppeteer: You need native CDP access, Lighthouse integration, stealth plugins (puppeteer-extra ecosystem), or your project is Chrome-only. Puppeteer is also the right choice for scripts that rely on Chrome-specific DevTools features like tracing, coverage, and performance profiling.

When to choose Playwright: You need cross-browser testing (Safari via WebKit), built-in auto-waiting, a batteries-included test runner, or multi-language support. Playwright's test fixtures and built-in parallelization make it stronger for large test suites. Its API design avoids many common timing pitfalls found in Puppeteer scripts.

20. Real-World Results

348 reports exported

Xiaomi Health app body composition reports extracted via automated Puppeteer navigation and screenshotting. Completed in under 2 hours what would have taken days of manual work, solving an export limitation in the Xiaomi ecosystem.

Lighthouse CI on every PR

Automated performance auditing in CI/CD pipelines catches Core Web Vitals regressions before merge. Performance scores tracked over time with budget thresholds that block substandard PRs from reaching production.

Visual regression detection

Scheduled Puppeteer jobs capture health dashboard screenshots and compare against baselines, catching CSS regressions, data rendering bugs, and broken charts that unit tests cannot detect.

AI-powered web interaction

Puppeteer integrated with OpenClaw gives the AI assistant browser capabilities. It can navigate sites, fill forms, download files, and extract data from JavaScript-rendered pages on demand from WhatsApp or Telegram.

21. Latest Updates (2025-2026)

Puppeteer v24.26.0 and WebDriver BiDi (v23+)

The current release is Puppeteer v24.26.0 (April 2026), bundling Chrome 141 and Firefox 144. WebDriver BiDi is production-ready since v23 and is the default protocol for Firefox connections. BiDi provides a standardized cross-browser automation protocol, while CDP remains the default for Chrome, providing deeper integration with Chrome-specific features like tracing, coverage, and Lighthouse. Puppeteer is not dropping CDP support — both protocols will coexist.

Stagehand v3: CDP-Native AI Automation

Stagehand v3 removed its Playwright dependency and now talks directly to the browser via CDP, minimizing round-trip time. It introduces a modular driver system that works seamlessly with Puppeteer. You can use Puppeteer Page objects directly with Stagehand's AI-powered methods (act(), extract(), observe()). This architectural shift prioritizes throughput and control over testing-first auto-waiting, making Stagehand v3 a natural companion for production Puppeteer workflows that need AI-driven browser interaction.

Cooperative Intercept Mode (v24+)

Puppeteer v24+ introduces Cooperative Intercept Mode, a fundamental redesign of request interception that resolves the long-standing conflict when multiple handlers need to process the same network request. In previous versions, only the first handler to call request.continue(), request.abort(), or request.respond() won, and subsequent handlers were silently ignored. This made composing independent interceptors (ad blocker + auth header injector + response cache) unreliable.

Cooperative mode introduces a voting system where each registered handler casts a vote on what should happen to the request: abort, continue, or respond. The final action follows priority ordering -- abort wins over respond, which wins over continue. If multiple handlers vote respond, the first registered handler's response is used. Each handler calls request.abort(), request.continue(), or request.respond() as before, but Puppeteer collects all votes before executing the final decision. This enables truly composable middleware patterns for network interception.

// Enable cooperative intercept mode
await page.setRequestInterception(true, { mode: 'cooperative' });

// Handler 1: Block tracking scripts
page.on('request', req => {
  if (req.url().includes('analytics')) req.abort();
  else req.continue();
});

// Handler 2: Add auth headers to API calls
page.on('request', req => {
  if (req.url().startsWith('https://api.example.com')) {
    req.continue({ headers: { ...req.headers(), Authorization: 'Bearer tok' } });
  } else {
    req.continue();
  }
});

// Both handlers compose correctly:
// analytics requests get aborted, API requests get auth headers

Related Technologies

PerformanceLighthouse: Web Performance Auditing DevelopmentClaude Code: AI-Augmented Development AI AgentsOpenClaw: Deploy Your Personal AI Assistant InfrastructureDocker: Development to Production Containers CI/CDGitHub Actions: Automated Workflows AI AutomationStagehand v3 + Browserbase: Production Browser Agents

Puppeteer: Browser Automation, Testing, and Web Scraping

Table of Contents

1. What Is Puppeteer?

2. Key Features

Headless Chrome Control

Page Navigation and Interaction

Screenshots and PDF Generation

Form Filling and Submission

Network Interception

End-to-End Testing

Performance Testing (Lighthouse)

Web Scraping

JavaScript Execution

Device and Network Emulation

Docker and CI/CD Integration

Tracing and Coverage

3. How I Use It

4. Getting Started

5. Page Navigation and Waiting Strategies

6. Screenshots and PDF Generation

7. Form Filling and Interaction

8. Network Interception (Request and Response)

9. End-to-End Testing

10. Lighthouse Integration (Programmatic API)

11. Web Scraping (Pagination and Infinite Scroll)

12. Cookie and Session Management

13. Device Emulation

14. JavaScript Execution in Page Context

15. Tracing and Coverage

16. Stealth Plugins and Anti-Detection

17. Error Handling Patterns

18. Docker and CI/CD Deployment

19. Playwright Comparison

20. Real-World Results

21. Latest Updates (2025-2026)

Puppeteer v24.26.0 and WebDriver BiDi (v23+)

Stagehand v3: CDP-Native AI Automation

Cooperative Intercept Mode (v24+)

Related Technologies