OpenAI Agents SDK: Building Multi-Agent Systems in Python

The definitive guide to the OpenAI Agents SDK -- the lightweight Python framework evolved from Swarm for building production multi-agent workflows. From agent primitives and handoffs to sandbox execution, tracing, guardrails, sessions, and deployment patterns for long-horizon tasks.

By Jose Nobile | Updated 2026-06-11 | 12 min read

What Is the OpenAI Agents SDK?

The OpenAI Agents SDK is a lightweight, production-ready Python framework for building multi-agent workflows. It evolved from Swarm, OpenAI's experimental multi-agent orchestration prototype released in late 2024, and graduated to a production-grade toolkit in March 2025. The SDK provides a minimal set of primitives -- Agents, Handoffs, and Guardrails -- that compose into sophisticated multi-agent systems without the abstraction overhead of heavier frameworks.

The core design philosophy is minimal abstractions, maximum composability. An Agent is an LLM equipped with instructions and tools. A Handoff transfers control between agents explicitly, carrying conversation context through the transition. A Guardrail validates inputs and outputs in parallel with agent execution. A Runner manages the agent loop: it invokes tools, sends results back to the LLM, and loops until the task is complete. This small surface area means the SDK is quick to learn but powerful enough for production multi-agent orchestration.

The SDK is provider-agnostic in its tool layer but optimized for OpenAI models. It supports the OpenAI Responses and Chat Completions APIs natively, and through community integrations can work with 100+ other LLMs. The Responses API is the recommended primitive going forward -- it combines the best of Chat Completions and Assistants APIs, with the Assistants API slated for deprecation on August 26, 2026 (announced August 2025, one-year sunset). Built-in hosted tools include WebSearchTool for real-time web search, FileSearchTool for RAG over uploaded files, and ComputerTool for desktop automation. The SDK ships in both Python (openai-agents) and TypeScript (openai-agents-js), though the April 2026 harness and sandbox capabilities are Python-first with TypeScript ports planned.

April 15, 2026: Model-Native Harness and Sandbox Execution

On April 15, 2026, OpenAI shipped the most significant update to the Agents SDK since its initial release. The update introduces two core capabilities: a model-native harness and native sandbox execution. Together, they transform the SDK from a lightweight orchestration tool into a full platform for long-horizon agent tasks that involve files, code, and system operations.

The model-native harness gives agents configurable memory, sandbox-aware orchestration, and Codex-like filesystem tools as standardized primitives. Agents can now inspect files, run commands, edit code, and work on multi-step tasks within controlled environments -- capabilities that previously required custom infrastructure built on top of the SDK. The harness aligns with the patterns powering OpenAI's Codex product, bringing those same primitives to the open SDK.

The sandbox execution layer provides agents with persistent, isolated workspaces. Each sandbox includes a workspace manifest (describing files, dependencies, and output directories), sandbox-native capabilities (file read/write, command execution, dependency installation), snapshot and resume support for long-running tasks, and storage provider integration (AWS S3, Google Cloud Storage, Azure Blob Storage, Cloudflare R2). Developers can bring their own sandbox or use built-in support for seven providers: Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel.

v0.17.5 (June 11, 2026) is the current release, with recent versions hardening the sandbox layer (credential isolation, retryable sandbox errors) and adding Realtime custom voice support. The SDK now works with over 100 non-OpenAI LLMs via the Chat Completions API, making it genuinely provider-agnostic. Subagents (child agents operating under a primary agent for task decomposition) are now available in beta for Python, while code mode (agents that write and execute code as part of their workflow) remains planned for both Python and TypeScript.

Agent Primitives

CORE

Agent

The fundamental building block. An Agent wraps an LLM with instructions (system prompt), a model reference, a set of tools, and a list of agents it can hand off to. Agents are defined declaratively and composed into pipelines. Each agent is a specialist with a focused responsibility.

CORE

Runner

The execution engine that drives the agent loop. A Runner establishes a context, invokes the agent, processes tool calls, sends results back to the LLM, and continues until the agent produces a final response or hands off. Supports streaming, cancellation, and parallel tool execution.

MULTI

Handoffs

The mechanism for multi-agent coordination. A handoff transfers control from one agent to another, carrying conversation context through the transition. Handoffs appear as tools to the LLM, so the model decides when to delegate. Each agent defines which agents it can hand off to, creating explicit collaboration graphs.

SAFETY

Guardrails

Input and output validation that runs in parallel with agent execution and fails fast when checks do not pass. Input guardrails validate user messages before the first agent processes them. Output guardrails validate the final agent response before it reaches the user. Guardrails enable content filtering, PII detection, policy enforcement, and safety checks.

TOOLS

Function Tools

Any Python function can become an agent tool. The SDK automatically generates JSON Schema from type annotations and docstrings, validates inputs with Pydantic, and serializes results back to the model. Function tools support async execution, error handling, and context injection via RunContextWrapper.

MEMORY

Sessions

A persistent memory layer for maintaining working context across agent turns. Sessions handle context length management, conversation history, and continuity automatically. Backed by SQLite (default), OpenAI Conversations API, or custom stores. Supports memory compaction to stay within token limits on long conversations.

Tool Definition and Function Calling

The Agents SDK turns any Python function into an agent tool with automatic schema generation. Type annotations define the parameter schema, the docstring becomes the tool description, and Pydantic handles validation. This approach eliminates the boilerplate of manually writing JSON Schema definitions for each tool.

from agents import Agent, Runner, function_tool

@function_tool
def lookup_customer(customer_id: str) -> str:
    """Look up a customer by their ID and return their profile."""
    # Your database query logic here
    return f"Customer {customer_id}: Premium tier, active since 2024"

@function_tool
def create_ticket(title: str, priority: str = "medium") -> str:
    """Create a support ticket with the given title and priority."""
    return f"Ticket created: {title} (priority: {priority})"

support_agent = Agent(
    name="Support Agent",
    instructions="You help customers with account issues.",
    tools=[lookup_customer, create_ticket],
)

result = Runner.run_sync(support_agent, "My account is locked, ID: C-12345")

The SDK also integrates with MCP servers natively. Any MCP-compatible server can be connected as a tool source, letting agents use the full ecosystem of MCP tools (databases, APIs, file systems, cloud services) without custom integration code. Combined with function tools, this gives agents access to both custom business logic and the broader MCP ecosystem.

from agents import Agent
from agents.mcp import MCPServerStdio

# Connect to an MCP server as a tool source
mcp_server = MCPServerStdio(
    command="npx",
    args=["-y", "@modelcontextprotocol/server-github"],
    env={"GITHUB_TOKEN": "ghp_..."}
)

agent = Agent(
    name="Code Review Agent",
    instructions="Review pull requests and provide feedback.",
    mcp_servers=[mcp_server],
)

Tracing and Debugging

The Agents SDK includes built-in tracing that is enabled by default. Every agent run automatically records a comprehensive trace of LLM generations, tool calls, handoffs, guardrail evaluations, and custom events. These traces are visualized in the OpenAI Traces dashboard, where you can inspect the full execution graph of any agent run -- from the initial user message through every tool call and handoff to the final response.

Tracing supports customization at multiple levels. Use add_trace_processor() to add custom processors that receive traces and spans for your own analytics pipeline. Integrate with third-party observability platforms like Langfuse, Arize, and Datadog for production monitoring. For background workers, call flush_traces() at the end of each unit of work to ensure immediate export. Tracing can be disabled globally via the OPENAI_AGENTS_DISABLE_TRACING=1 environment variable or per-run via RunConfig.tracing_disabled.

from agents import Agent, Runner, RunConfig
from agents.tracing import add_trace_processor

# Custom trace processor for your analytics
class MetricsProcessor:
    def on_trace_start(self, trace):
        print(f"Agent run started: {trace.trace_id}")

    def on_span_end(self, span):
        if span.span_type == "tool_call":
            print(f"Tool: {span.name}, latency: {span.duration_ms}ms")

add_trace_processor(MetricsProcessor())

# Disable tracing for a specific run
config = RunConfig(tracing_disabled=True)
result = Runner.run_sync(agent, "Hello", run_config=config)

Multi-Agent Patterns: Handoffs and Delegation

Handoffs are the primary mechanism for multi-agent coordination. When an agent defines other agents in its handoffs list, those agents become available as tools the LLM can invoke. The model decides when to delegate based on its instructions and the conversation context. When a handoff occurs, the target agent receives the full conversation history and takes over execution. This creates explicit collaboration graphs where each agent is a specialist.

Common multi-agent patterns include triage routing (a router agent classifies the request and hands off to a specialist), pipeline delegation (agents hand off in sequence, each adding to the conversation), and agents-as-tools (one agent calls another as a tool to get a sub-result without transferring control). The SDK supports all three patterns natively.

from agents import Agent

# Specialist agents
billing_agent = Agent(
    name="Billing Specialist",
    instructions="Handle billing inquiries, refunds, and payment issues.",
    tools=[lookup_invoice, process_refund],
)

technical_agent = Agent(
    name="Technical Support",
    instructions="Handle technical issues, bugs, and feature questions.",
    tools=[search_docs, create_bug_report],
)

# Triage agent routes to specialists
triage_agent = Agent(
    name="Customer Support Triage",
    instructions="""You are the first point of contact.
    Route billing questions to the Billing Specialist.
    Route technical issues to Technical Support.
    Handle general inquiries yourself.""",
    handoffs=[billing_agent, technical_agent],
)

# The triage agent decides when to hand off
result = Runner.run_sync(triage_agent, "I was charged twice for my subscription")

For more complex orchestration involving multi-agent systems with shared state, the SDK provides RunContextWrapper -- a mutable context object that persists across runs and is accessible to all tools and agents in the pipeline. This enables agents to share state, accumulate results, and coordinate without relying solely on conversation history. For inter-agent communication across different frameworks or services, the A2A protocol provides a complementary standard.

Native Sandbox Execution

The April 2026 update introduced Sandbox Agents -- a beta surface for running agents in persistent, isolated workspaces. Each sandbox gives the agent a proper filesystem where it can read and write files, install dependencies, run commands, and execute code. This eliminates the need for developers to build custom execution layers for agents that work with code, documents, or system operations.

The sandbox architecture uses a Manifest abstraction for describing the agent's workspace. A manifest specifies which local files to mount, where to write outputs, and how to bring in data from storage providers (S3, GCS, Azure Blob, Cloudflare R2). Sandbox clients handle the lifecycle: create a workspace, mount files, execute agent runs, take snapshots, and resume from snapshots for long-running tasks.

Seven sandbox providers are supported out of the box: Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. Developers can also bring their own sandbox by implementing the sandbox client interface. Pricing follows standard API rates based on tokens and tool use -- there are no separate sandbox fees from OpenAI (sandbox provider costs apply separately).

Comparison: OpenAI Agents SDK vs Claude Agent SDK vs Google ADK vs LangGraph

The AI agent framework landscape in 2026 includes four major players, each with distinct strengths and trade-offs. The right choice depends on your model preferences, orchestration complexity, and production requirements.

Dimension	OpenAI Agents SDK	Claude Agent SDK	Google ADK	LangGraph
Language	Python (TS planned)	TypeScript, Python	Python, TS, Java, Go	Python, TypeScript
Model Lock-in	OpenAI-optimized, 100+ via integrations	Claude models only	Gemini-optimized, supports others	Fully model-agnostic
Multi-Agent	Handoffs (explicit delegation)	Subagents, agent teams	A2A Agent Cards, Vertex AI	Graph-based state machines
Safety	Guardrails (input/output)	Extended thinking, safety-first	Vertex AI guardrails	Custom via nodes
Observability	Built-in tracing dashboard	Hook-driven logging	Cloud Trace integration	LangSmith (best-in-class)
Sandbox	7 native providers	OS-level access, no sandbox abstraction	Vertex AI Engine	BYO execution
Memory	Sessions (SQLite, API, compaction)	CLAUDE.md, conversation context	Vertex AI sessions	Checkpointing with time travel
Best For	Fast prototyping, OpenAI ecosystem	Safety-critical, OS-level automation	Google Cloud, enterprise scale	Complex stateful workflows

The OpenAI Agents SDK excels at rapid development with minimal abstractions -- you can go from zero to a working multi-agent system in under 50 lines. The Claude Agent SDK has the deepest OS access with built-in file and shell tools, making it the strongest choice for computer-use agents in safety-critical domains. Google ADK offers the broadest language support and integrates tightly with Vertex AI for managed deployment. LangGraph provides the highest production readiness with checkpointing, time travel, and LangSmith observability, at the cost of more complex graph-based abstractions.

Production Deployment Patterns

Deploying Agents SDK workflows to production requires addressing reliability, observability, cost management, and security. The SDK provides several built-in mechanisms, but production deployments typically combine these with external infrastructure.

Session Persistence

Use SQLiteSession with a file-based database for single-server deployments, or implement a custom session backend (Redis, PostgreSQL) for distributed systems. The OpenAIConversationsSession offloads state management to OpenAI's API. For long conversations, enable OpenAIResponsesCompactionSession to automatically compact history and stay within token limits.

Guardrail Strategy

Layer input guardrails for content policy enforcement (PII detection, prompt injection detection, topic restriction) and output guardrails for response quality (format validation, factual grounding, safety checks). Guardrails run in parallel with agent execution and fail fast, minimizing latency impact while ensuring safety.

Cost Control

Monitor token usage through tracing spans. Set maximum turn limits on the Runner to prevent runaway loops. Use cheaper models for triage agents and reserve expensive models for specialist agents that need maximum capability. Session compaction reduces context size and per-request costs on long conversations.

Durable Execution with Temporal

For mission-critical workflows, integrate with Temporal for durable execution. The official OpenAI Agents SDK + Temporal integration wraps agent runs in Temporal workflows, providing automatic retries, timeout handling, and crash recovery. Each tool call becomes a Temporal activity with its own retry policy, ensuring that long-running agent tasks survive infrastructure failures.

Sandbox Security

When using sandbox execution, treat each sandbox as an untrusted environment. Mount only the files the agent needs via the workspace manifest. Use read-only mounts for reference data. Set resource limits (CPU, memory, disk) at the sandbox provider level. Capture all sandbox outputs for audit logging. Snapshots enable checkpoint/restore for long-horizon tasks without leaving state in running sandboxes.

Getting Started

The OpenAI Agents SDK requires Python 3.10 or newer. Install with pip and create your first agent in a few lines.

# Install the SDK
pip install openai-agents

# Optional: voice support
pip install openai-agents[voice]

# Optional: Redis session backend
pip install openai-agents[redis]

# minimal_agent.py -- Your first agent
from agents import Agent, Runner

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant. Be concise.",
)

result = Runner.run_sync(agent, "What is the capital of France?")
print(result.final_output)  # "Paris"

For production use, set the OPENAI_API_KEY environment variable and configure tracing to export to your observability platform. The SDK documentation at openai.github.io/openai-agents-python covers all primitives, patterns, and integration guides in depth.

Related Technologies

SDKClaude Agent SDK: Production-Grade AI Agents ArchitectureAI Agent Architecture: Multi-Agent Systems PlatformCloudflare: Edge Computing and Workers SDKVercel AI SDK: Streaming AI Applications ProtocolA2A Protocol: Agent-to-Agent Interoperability LanguagePython: Development Guide ProtocolMCP Servers: Connecting AI to Your Tools FrameworkGoogle ADK: Multi-Agent AI Systems