Claude Agent SDK: Building Production-Grade AI Agents

The comprehensive guide to Anthropic's official SDK for building autonomous AI agents. Master the 4-phase agentic loop, built-in tools, custom tool development in Python and TypeScript, MCP integration, structured outputs, vision, streaming, memory, permissions, cost tracking, batch processing, and deployment strategies for production environments.

What Is the Claude Agent SDK?

The Claude Agent SDK is Anthropic's official framework for building autonomous AI agents in Python and TypeScript. It provides the same agentic capabilities that power Claude Code -- file reading, code execution, web browsing, and tool use -- as a programmable library that you embed in your own applications. Python requires 3.10+ and TypeScript requires Node 18+.

At its foundation, the SDK implements a 4-phase agentic loop: (1) the agent receives a task, (2) it reasons about what tools to use, (3) it executes tool calls and collects results, and (4) it evaluates whether the task is complete or needs further iteration. This loop runs autonomously until the agent determines the task is finished or hits a configured limit.

The SDK abstracts away the complexities of tool calling, response parsing, error handling, and conversation management. You define your agent's system prompt, available tools (built-in or custom), and constraints. The SDK handles the rest: constructing API calls, parsing tool use responses, executing tools, feeding results back, managing context window limits, and streaming output to the user. It also integrates natively with MCP servers, allowing your agents to access the same ecosystem of 50+ tool servers available to Claude Code.

The 4-Phase Agentic Loop

1

Receive

Agent receives a task from the user or calling system. The system prompt, available tools, and conversation history form the initial context.

2

Reason

Claude analyzes the task, plans an approach, and decides which tools to call. Extended thinking provides chain-of-thought visibility for complex reasoning.

3

Execute

The SDK dispatches tool calls, collects results, handles errors, and manages timeouts. Multiple tools can execute in parallel when independent.

4

Evaluate

Agent assesses results against the original task. If complete, it formulates a response. If not, it loops back to phase 2 with new context from tool results.

Key Features

CORE

Built-in Tools

File read/write (Read, Write, Edit), code execution (Bash), web browsing (WebFetch, WebSearch), file search (Glob, Grep), and notebook editing (NotebookEdit). These tools give agents the same capabilities as a developer working in a terminal -- reading code, running scripts, searching the web, and modifying files.

BUILD

Custom Tool Development

Define tools as functions with typed parameters, descriptions, and return schemas. In Python, use the @tool decorator with type-hinted arguments. In TypeScript, define tools as objects with name, description, inputSchema (JSON Schema), and an execute function. JSON Schema validates inputs automatically.

BUILD

Multi-Language Support

Official SDKs for Python (3.10+ required, anthropic-agent-sdk) and TypeScript (Node 18+ required, @anthropic-ai/agent-sdk). Both offer identical capabilities with idiomatic APIs. Python uses async/await with asyncio. TypeScript uses native Promises and async iterators.

CORE

MCP and A2A Integration

Connect to any MCP server from within your agent. Tools from MCP servers appear alongside built-in and custom tools. Your custom agents can use Atlassian, GitHub, Slack, databases, and any other MCP-compatible service natively. Supports Stdio, SSE, and Streamable HTTP transports. Pair this with the A2A Protocol to expose your Claude agent as a first-class A2A peer, or to delegate subtasks to specialist agents running in other frameworks and clouds.

UX

User Interaction and Approvals

Support for approval workflows, interactive confirmations, and streaming input. Agents can pause execution to ask the user for input, display progress in real-time, and present structured results. Configurable human-in-the-loop patterns for sensitive operations like file writes, command execution, or API calls.

BUILD

Structured Outputs

Force agent responses to conform to JSON schemas with required fields, types, enums, and nested objects. The SDK validates responses and retries with corrective feedback if the model produces invalid output. Type-safe in both Python (via Pydantic models) and TypeScript (via Zod schemas or JSON Schema). Essential for programmatic consumption.

STATE

Session Management

Manage conversation history, tool state, and agent memory across interactions. Sessions can be persisted to disk or database and resumed later for cross-session recovery. Context window management handles truncation and summarization when conversations grow long. Session IDs enable multi-tenant deployments.

STATE

Memory and Todo Tracking

Agents can persist memories across sessions using built-in memory tools or MCP Memory servers. Key-value storage for user preferences, project context, and learned patterns. Todo tracking lets agents maintain task lists, mark items complete, and resume work across restarts. Memory is scoped per user, project, or global.

SEC

Permissions and Tool Restrictions

Fine-grained control over which tools agents can use, which files they can access, and which commands they can execute. Define allow/deny lists for tools, file path patterns, and command prefixes. Sandboxing options isolate agent execution. Rate limiting prevents runaway resource consumption.

COST

Cost Tracking

Built-in token counting and cost estimation for every API call. Set spending limits per agent, per session, or per task. Track input tokens, output tokens, and cache read/write tokens separately. Alerts when approaching budget thresholds. Optimize costs by choosing appropriate models (Opus for complex reasoning, Haiku for simple tasks).

CORE

Extended Thinking

Enable chain-of-thought reasoning for complex tasks. The agent shows its reasoning process before acting, improving transparency and debuggability. Extended thinking uses additional tokens but produces more reliable results on difficult problems. Configure thinking budget independently from output budget.

CORE

Vision Support

Agents can process images alongside text. Pass screenshots, diagrams, charts, or photos as input. The agent can analyze UI layouts, read text from images, interpret data visualizations, and compare visual designs. Supports PNG, JPEG, GIF, and WebP formats. Useful for visual QA, design review, and document analysis.

BUILD

Streaming vs Single Mode

Choose between streaming mode (real-time token-by-token output) and single mode (complete response at once). Streaming is ideal for interactive UIs where users see progress. Single mode is better for batch processing and pipelines. Both modes support tool use, structured outputs, and extended thinking.

BUILD

Error Handling and Recovery

Automatic retry with exponential backoff for transient API errors. Tool execution errors are fed back to the agent for self-correction. Configurable error budgets prevent infinite retry loops. Graceful degradation when tools are unavailable. Session checkpointing enables recovery from crashes mid-task.

DEPLOY

Deployment Strategies

Deploy as serverless functions (AWS Lambda, Cloud Functions), containerized services (Docker, Kubernetes), long-running daemons, or embedded in existing applications. Horizontal scaling via stateless design with external session storage. The SDK is lightweight with minimal dependencies. Supports edge deployment via Cloudflare Workers.

DEPLOY

Batch Processing

Process multiple tasks in parallel using agent pools. Queue tasks, distribute across agent instances, collect results, and handle failures. Ideal for bulk data processing, mass code review, document analysis at scale, and migration jobs. Combine with structured outputs for consistent machine-parseable results across all items.

How I Use It

I use the Claude Agent SDK to build custom automation agents that go beyond what Claude Code offers out of the box. My primary use case is building specialized agents for clients -- agents that understand their specific business domain, connect to their internal tools, and automate their unique workflows.

For the project, I built a deployment agent that reads the git log, determines which microservices changed, generates Helm chart values, validates Kubernetes manifests, and triggers deployments. This agent runs in a CI/CD pipeline and has reduced deployment errors to near zero because it cross-references changes with the service dependency graph before deploying.

Another production agent handles database migration validation. It reads proposed migration files, analyzes them against the current schema, checks for backward compatibility, estimates execution time on production-sized tables, and generates rollback scripts. This replaced a manual review process that took 30 minutes per migration and occasionally missed breaking changes.

I also build agents with structured JSON outputs for data processing pipelines. These agents analyze unstructured data (emails, documents, logs), extract structured information according to a defined schema, and feed results into downstream systems. The structured output guarantees that every agent response is machine-parseable, eliminating the fragility of regex-based extraction.

Getting Started

Install the SDK and build your first agent in under 20 lines. The Python SDK requires Python 3.10+, and the TypeScript SDK requires Node 18+.

# Install the Python SDK
pip install claude-agent-sdk

# Or the TypeScript SDK
npm install @anthropic-ai/agent-sdk

Create a minimal agent with a system prompt and built-in tools. This agent can read files, search code, and execute commands.

# minimal_agent.py - Python example
from claude_agent_sdk import Agent, BuiltinTools

agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="""You are a code review assistant.
    Analyze code for bugs, security issues, and
    style violations. Be concise and actionable.""",
    tools=[
        BuiltinTools.Read,
        BuiltinTools.Glob,
        BuiltinTools.Grep,
        BuiltinTools.Bash
    ],
    max_iterations=20
)

result = agent.run(
    "Review the Python files in ./src/ for common "
    "security vulnerabilities and suggest fixes."
)
print(result.text)

Add a custom tool to extend your agent's capabilities. In Python, tools are defined as decorated functions with typed parameters.

# Custom tool definition (Python)
from claude_agent_sdk import tool

@tool(description="Query the user database")
def query_users(
    filter: str = "Filter expression (e.g. 'active=true')",
    limit: int = "Maximum results to return"
) -> str:
    """Execute a filtered query against the user table."""
    results = db.users.find(parse_filter(filter), limit=limit)
    return json.dumps([u.to_dict() for u in results])

# Add to agent
agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="You are a data analyst assistant.",
    tools=[BuiltinTools.Read, query_users],
    max_iterations=10
)

In TypeScript, tools are defined as objects with a name, description, JSON Schema for inputs, and an execute function.

// Custom tool definition (TypeScript)
import { Agent, Tool } from "@anthropic-ai/agent-sdk";

const queryUsers: Tool = {
  name: "query_users",
  description: "Query the user database",
  inputSchema: {
    type: "object",
    properties: {
      filter: {
        type: "string",
        description: "Filter expression (e.g. 'active=true')"
      },
      limit: {
        type: "number",
        description: "Maximum results to return"
      }
    },
    required: ["filter"]
  },
  async execute(input) {
    const results = await db.users.find(
      parseFilter(input.filter),
      { limit: input.limit ?? 10 }
    );
    return JSON.stringify(results);
  }
};

const agent = new Agent({
  model: "claude-sonnet-4-20250514",
  systemPrompt: "You are a data analyst assistant.",
  tools: [queryUsers],
  maxIterations: 10
});

Connect MCP servers to give your agent access to external services directly from code.

# MCP integration example (Python)
from claude_agent_sdk import Agent, BuiltinTools, MCPServer

agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="You are a project management assistant.",
    tools=[BuiltinTools.Read, BuiltinTools.Bash],
    mcp_servers=[
        MCPServer(
            name="github",
            command="npx",
            args=["-y", "@modelcontextprotocol/server-github"],
            env={"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]}
        ),
        MCPServer(
            name="slack",
            command="npx",
            args=["-y", "@modelcontextprotocol/server-slack"],
            env={"SLACK_BOT_TOKEN": os.environ["SLACK_TOKEN"]}
        )
    ],
    max_iterations=15
)

# Agent can now use GitHub and Slack tools natively
result = agent.run(
    "Read PR #42 from repo acme/api, summarize the "
    "changes, and post a summary to #dev-updates on Slack."
)

Advanced Techniques

Multi-Agent Pipelines

Build pipelines where specialized agents hand off work. A planner agent breaks tasks into subtasks, worker agents execute each subtask, and a reviewer agent validates results. Each agent has a focused system prompt and tool set, improving reliability over a single general-purpose agent.

# Multi-agent pipeline
planner = Agent(
    model="claude-opus-4-20250514",
    system_prompt="Break tasks into subtasks. Output JSON.",
    tools=[BuiltinTools.Read, BuiltinTools.Glob]
)

worker = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="Execute coding tasks. Write clean code.",
    tools=[BuiltinTools.Read, BuiltinTools.Write,
           BuiltinTools.Bash]
)

reviewer = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="Review code for bugs and style issues.",
    tools=[BuiltinTools.Read, BuiltinTools.Grep]
)

# Orchestrate
plan = planner.run("Refactor auth module for OAuth2")
for subtask in plan.structured_output["subtasks"]:
    result = worker.run(subtask["description"])
    review = reviewer.run(f"Review: {result.text}")

Structured Output Schemas

Define JSON schemas that force agent responses into predictable structures. This is essential for agents that feed into automated pipelines. The SDK validates outputs and retries with corrective feedback if the model produces invalid JSON.

# Structured output with schema
output_schema = {
    "type": "object",
    "properties": {
        "vulnerabilities": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "severity": {"enum": ["low","medium","high","critical"]},
                    "file": {"type": "string"},
                    "line": {"type": "integer"},
                    "description": {"type": "string"},
                    "fix": {"type": "string"}
                },
                "required": ["severity","file","line","description","fix"]
            }
        },
        "summary": {"type": "string"}
    },
    "required": ["vulnerabilities", "summary"]
}

agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="Security audit agent.",
    tools=[BuiltinTools.Read, BuiltinTools.Grep],
    output_schema=output_schema
)

Streaming and Real-Time Output

Use streaming mode for interactive applications where users need to see progress in real-time. The SDK emits events for each token, tool call start/end, and thinking block. In single mode, you get the complete response at once, which is better for batch processing and automated pipelines where latency per-item is less important than throughput.

# Streaming mode (Python)
async for event in agent.stream("Analyze this codebase"):
    if event.type == "text":
        print(event.text, end="", flush=True)
    elif event.type == "tool_start":
        print(f"\n[Using {event.tool_name}...]")
    elif event.type == "thinking":
        print(f"\n[Thinking: {event.summary}]")

Session Persistence and Recovery

Save session state to disk or a database for cross-session recovery. When an agent crashes or a user returns later, resume from exactly where the conversation left off. Session state includes conversation history, tool results, memory entries, and todo items. Useful for long-running tasks that span multiple user interactions.

# Session persistence (Python)
from claude_agent_sdk import Agent, SessionStore

store = SessionStore(backend="sqlite", path="./sessions.db")

# Resume or start new session
agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt="You are a project assistant.",
    tools=[BuiltinTools.Read, BuiltinTools.Write],
    session_store=store,
    session_id="user-123-project-abc"
)

# Agent resumes from last checkpoint if session exists
result = agent.run("Continue where we left off.")

Cost Management Strategies

Implement tiered model selection based on task complexity. Use Haiku for simple extraction, Sonnet for general tasks, and Opus for complex reasoning. Cache tool results across iterations to avoid redundant API calls. Set token budgets per agent to prevent runaway costs. Monitor per-session spending and alert on anomalies.

# Cost tracking (Python)
result = agent.run("Analyze security vulnerabilities")

print(f"Input tokens:  {result.usage.input_tokens}")
print(f"Output tokens: {result.usage.output_tokens}")
print(f"Cache reads:   {result.usage.cache_read_tokens}")
print(f"Total cost:    ${result.usage.total_cost:.4f}")
print(f"Iterations:    {result.iterations}")

Batch Processing

Process large datasets by running multiple agent instances in parallel. Queue tasks, distribute across workers, collect structured results, and handle individual failures without stopping the batch. Combine with structured outputs to guarantee consistent machine-parseable results across all items.

# Batch processing (Python)
import asyncio
from claude_agent_sdk import Agent, BuiltinTools

agent = Agent(
    model="claude-haiku-3-20250307",
    system_prompt="Extract company info from text.",
    output_schema=company_schema
)

async def process_batch(documents):
    tasks = [agent.arun(doc) for doc in documents]
    results = await asyncio.gather(*tasks,
                                    return_exceptions=True)
    return [r.structured_output for r in results
            if not isinstance(r, Exception)]

# Process 500 documents in parallel batches
all_results = await process_batch(documents)

Latest SDK Updates (2025-2026)

CORE

Hooks System

Deterministic processing points in the agent loop: PreToolUse, PostToolUse, PostToolUseFailure, UserPromptSubmit, Stop, SubagentStop, PreCompact, Notification, SubagentStart, and PermissionRequest. Hooks execute shell commands or custom logic at each point, enabling linting after edits, security scanning before commits, and notifications on task completion without modifying agent behavior.

SESSION

Session Management

Automatic disk persistence saves session state after every turn. Continue and resume operations restore agents to their exact prior state. Session fork creates a new session from a copy of an existing session's history, enabling branching exploration where you try different approaches from the same checkpoint without losing the original conversation.

AGENTS

Programmatic Subagents

Define subagents with description fields that tell the parent agent what each subagent specializes in. Claude auto-delegates tasks to the most appropriate subagent based on these descriptions. Subagents can be resumed via session_id, maintaining their state across invocations. This enables persistent specialist agents that build up domain knowledge over time.

BREAKING

v0.1.0 Breaking Change

Starting with v0.1.0, the SDK no longer loads Claude Code's system prompt by default. Agents receive a minimal prompt unless you explicitly request the claude_code preset. This means custom agents are no longer constrained by Claude Code's safety rules and tool descriptions, giving you full control over the system prompt while requiring you to define your own tool access policies.

v0.1.56

v0.1.56 (April 2026)

Latest release adds get_context_usage() method to ClaudeSDKClient for querying context window usage by category. The @tool decorator now supports typing.Annotated for per-parameter descriptions in JSON Schema, improving tool documentation. New session_id option in ClaudeAgentOptions allows specifying custom session IDs for external tracking and session resumption.

v0.1.58

v0.1.58: SessionStore & V2 Preview (April 2026)

Full SessionStore support at parity with TypeScript: a protocol with 5 methods (append, load, list_sessions, delete, list_subkeys), InMemorySessionStore reference implementation, and 9 async store-backed helpers (list_sessions_from_store, fork_session_via_store, etc.). Three reference adapters (S3/JSONL part files, Redis/RPUSH lists, Postgres/asyncpg+jsonb) ship under examples/session_stores/. A 13-contract conformance test harness at claude_agent_sdk.testing.run_session_store_conformance lets third-party adapter authors validate compatibility. delete_session() now removes sibling subagent transcript directories. The TypeScript SDK introduces a simplified V2 interface preview with send() and stream() patterns for easier multi-turn conversations. Auto permission mode added to PermissionMode type.

BETA

Claude Managed Agents (April 2026)

Anthropic launched Managed Agents in public beta with the managed-agents-2026-04-01 header. Instead of building your own agent loop and tool execution, you get a fully managed harness where Claude reads files, runs commands, browses the web, and executes code in a secure sandbox. Server-sent event streaming provides real-time output. This is Anthropic's hosted equivalent of the self-hosted Agent SDK -- ideal for teams that want agentic capabilities without infrastructure management.

BETA

Advisor Tool (April 2026)

The advisor tool (beta header advisor-tool-2026-03-01) pairs a fast executor model (Sonnet 4.6 or Haiku 4.5) with Opus 4.6 as a strategic advisor consulted only for complex decisions. On SWE-bench Multilingual, Sonnet with Opus advisor scored 74.8% versus 72.1% for Sonnet alone, at 11.9% lower cost than running Opus solo. This pattern is a game-changer for production agent systems: use cheap models by default, escalate to Opus intelligence only when needed.

Real-World Results

Zero deployment errors

Custom deployment agent cross-references code changes with service dependency graph, validates Helm charts, and checks Kubernetes manifests before triggering deploys. No manual deployment errors since adoption.

30 min to 2 min reviews

Database migration validation agent reduced review time from 30 minutes to 2 minutes per migration while catching backward compatibility issues that manual review occasionally missed.

100% parseable outputs

Structured output schemas guarantee every agent response is machine-parseable JSON, eliminating the fragility of regex-based extraction in data processing pipelines.

Multi-agent reliability

Specialized agent pipelines (planner + worker + reviewer) produce higher quality results than single general-purpose agents, with each agent focused on its domain of expertise.

Related Technologies