ARCHITECTURE

AI Agent Architecture: From Single Agents to Multi-Agent Systems

A practitioner's guide to building production AI agents. From the Claude Agent SDK and MCP server development to multi-agent orchestration, structured outputs, and hook-driven automation.

By Jose Nobile | Updated 2026-04-23 | 14 min read

Agent Foundations

An AI agent is a system that uses a language model as its reasoning engine, combined with tools that let it take actions in the real world. The critical difference between a chatbot and an agent is autonomy: agents decide which tools to use, in what order, and how to handle failures -- all without human intervention at each step.

The core agent loop follows a 4-phase cycle: (1) read context, (2) reason about what to do, (3) execute a tool, (4) observe the result -- then repeat. This 4-phase loop is the foundation of the Claude Agent SDK and all modern agent frameworks. The sophistication comes from how you design the tools, manage context, handle errors, and coordinate multiple agents working together.

Modern agent architectures have moved beyond single-agent systems. Production deployments typically involve specialized agents that collaborate: one agent for code generation, another for testing, a third for deployment. This specialization mirrors how human engineering teams work, and it produces better results than a single generalist agent.

Claude Agent SDK

The Claude Agent SDK provides a TypeScript/Python framework for building custom AI agents powered by Claude. It handles the agent loop, tool execution, conversation management, and error recovery, letting you focus on defining tools and agent behavior.

SDK

4-Phase Agent Loop

The SDK manages the 4-phase loop (read, reason, act, observe) automatically. You define the system prompt, available tools, and stopping conditions. The SDK handles retries, context window management, and graceful degradation.

SDK

Tool Definition

Tools are defined with JSON Schema for parameters, a description for the model, and an execution handler. The SDK validates inputs, handles errors, and serializes results back to the model.

SDK

Conversation Management

Automatic context window management with truncation strategies, message caching, and prompt optimization. The SDK ensures the model always has the most relevant context within token limits.

The SDK supports streaming responses, cancellation, and parallel tool execution. It also provides hooks for observability -- you can log every tool call, measure latency, and track token usage for cost optimization.

MCP Server Development

Building an MCP (Model Context Protocol) server lets you expose any API, database, or system as a tool that AI agents can use. The MCP specification defines a standard interface for tool discovery, invocation, and result formatting.

An MCP server consists of three components:

  • Tools -- Functions the agent can call, each with a name, description, and typed parameter schema
  • Resources -- Data the agent can read, like files, database records, or API responses
  • Prompts -- Pre-built prompt templates the agent can use for common tasks

TypeScript MCP Server

Use the official @modelcontextprotocol/sdk package. Define tools with Zod schemas, implement handlers, and run as a stdio or HTTP server. Ideal for Node.js environments and integration with existing TypeScript codebases.

Python MCP Server

Use the mcp Python package with Pydantic models for tool schemas. Supports async handlers, streaming, and integration with FastAPI. Best for data science and ML pipeline integrations.

Testing MCP Servers

Use the MCP Inspector CLI for interactive testing, or write unit tests that mock the transport layer. Validate tool schemas, test error handling, and verify response formats before deploying to production agents.

Tool Orchestration

Tool orchestration is the art of designing a tool suite that enables agents to accomplish complex tasks efficiently. Poor tool design leads to confused agents that waste tokens on trial-and-error. Great tool design leads to agents that complete tasks in 2-3 tool calls instead of 10.

Key principles for tool design:

  • Atomic operations -- Each tool does one thing well. Avoid tools that combine multiple operations.
  • Clear descriptions -- The model selects tools based on descriptions. Ambiguous descriptions lead to wrong tool selection.
  • Typed parameters -- Use JSON Schema constraints (enums, patterns, min/max) to prevent invalid inputs.
  • Informative errors -- Return errors that tell the agent what went wrong and how to fix it. "Invalid date format, expected YYYY-MM-DD" is better than "Error."
  • Composability -- Design tools that chain naturally. The output of one tool should be usable as input to another.

The difference between a 5-tool and a 50-tool suite matters less than tool quality. A well-designed set of 5 tools with clear descriptions and composable outputs will outperform a bloated toolkit where the agent struggles to choose the right tool.

Subagent Patterns

Subagents are child agent instances spawned by a parent agent to handle isolated subtasks. Each subagent gets its own conversation context, tool access, and execution scope. The parent orchestrates by defining what each subagent should do and aggregating their results.

PATTERN

Delegation

Parent identifies a self-contained subtask and delegates it to a specialist subagent. Example: "Research the top 5 competitors" delegated to a research agent with web search tools.

PATTERN

Parallel Execution

Parent spawns multiple subagents simultaneously for independent tasks. Example: generating unit tests, integration tests, and documentation in parallel for a new API endpoint.

PATTERN

Iterative Refinement

One subagent generates output, another evaluates it, and the cycle repeats until quality criteria are met. Example: code generation + code review loop until no issues found.

PATTERN

Context Isolation

When a parent's context is full, spawn a subagent with a fresh context window and only the relevant information. Prevents context pollution and reduces token costs.

Agent Teams

Agent teams are organized groups of agents with defined roles, communication protocols, and coordination strategies. Unlike ad-hoc subagent spawning, teams have persistent roles and shared understanding of the project.

Effective team structures include:

  • Architect + Builders -- One agent designs the solution, others implement components. The architect reviews all changes before merging.
  • Full-Stack Team -- Separate agents for frontend, backend, database, and testing. Each owns its domain and communicates changes that affect others.
  • Writer + Reviewer -- Every change goes through a two-agent pipeline: one writes, one reviews. The reviewer has stricter quality criteria.
  • Specialist Pool -- A coordinator agent assigns tasks to the most appropriate specialist based on the task type (security, performance, UI, etc.).

In the project, agent teams follow the "Full-Stack Team" pattern. The API agent handles Node.js microservice changes, the Frontend agent manages React components, and the Test agent writes integration and E2E tests. A coordinator agent ensures cross-cutting changes are consistent.

Structured Outputs

Structured outputs force the model to return data in a specific JSON schema, eliminating parsing errors and enabling reliable programmatic consumption of agent results. This is critical for agent-to-agent communication and for integrating agent outputs into existing systems.

Use cases for structured outputs in agent systems:

  • Task decomposition -- Agent returns a typed array of subtasks with priorities, dependencies, and estimated complexity
  • Code review results -- Structured list of issues with severity, file location, description, and suggested fix
  • Data extraction -- Extract specific fields from unstructured text into a typed schema
  • Decision logs -- Agent documents its reasoning, alternatives considered, and confidence level in a structured format

Claude supports structured outputs via the tool_use response format, where you define the expected schema as a tool and the model "calls" it with the structured data. This approach is more reliable than asking the model to output JSON in plain text.

Hook-Driven Automation

Hooks are event-driven scripts that execute at specific points in an agent's lifecycle. They enable "guardrail" automation -- quality checks, security scanning, and compliance enforcement that happen automatically without the agent needing to remember to do them.

HOOK

Pre-Edit Validation

Before any file edit, validate that the target file exists, the agent has permission to modify it, and the change doesn't affect protected files (configs, secrets, migrations).

HOOK

Post-Edit Linting

After every file edit, automatically run the linter, formatter, and type checker. If any fail, inject the error into the agent's context so it can fix the issue immediately.

HOOK

Pre-Commit Security

Before committing changes, scan for hardcoded secrets, check dependency vulnerabilities, and validate that no sensitive files are being committed. Block the commit if issues are found.

HOOK

Completion Notification

When an agent completes a task, send a summary to Slack/Discord/email with the changes made, tests run, and any issues encountered. Enables async oversight of agent work.

Permission Design

Permission design determines what an agent can and cannot do. In production systems, this is the primary safety mechanism. The principle of least privilege applies: agents should have the minimum permissions necessary to complete their tasks.

Permission layers in a well-designed agent system:

  • Tool-level -- Which tools the agent can access. A code review agent doesn't need file write or command execution tools.
  • Scope-level -- Which files, directories, or resources the agent can access. Frontend agents can't modify backend code.
  • Action-level -- What operations are allowed per tool. The file tool might allow read everywhere but write only in specific directories.
  • Approval-level -- Which actions require human approval before execution. Destructive operations (delete, deploy) always need approval.

Production Deployment

Deploying agents to production introduces concerns that don't exist in development: reliability, cost control, observability, and safety at scale. Here are the critical considerations:

PROD

Error Recovery

Agents will encounter errors -- API rate limits, malformed responses, tool failures. Design retry strategies with exponential backoff, fallback tools, and graceful degradation paths.

PROD

Cost Guardrails

Set per-task and per-session token budgets. Implement circuit breakers that stop agents from entering infinite loops. Monitor cost per task to identify optimization opportunities.

PROD

Observability

Log every tool call with inputs, outputs, latency, and token usage. Trace multi-agent conversations to debug coordination failures. Dashboard key metrics: success rate, cost per task, average latency.

PROD

Safety Boundaries

Define hard limits on what agents can do in production. No production database writes without approval. No deployment without passing CI. No external API calls to unauthorized endpoints.

Real Example: Parallel Agent Strategies

The platform uses a multi-agent development strategy coordinated through a 512-line AGENTS.md file. This file serves as both the agent configuration and the development playbook, defining team structures, skill registries, and coordination protocols.

Three-Agent Feature Teams

Every feature is developed by three agents in parallel: API (Node.js/TypeScript backend), UI (React frontend), and QA (integration + E2E tests). Each agent works in its own worktree, merging only when all three are complete.

Cross-Service Coordination

When a change spans multiple microservices (e.g., new payment method), a coordinator agent decomposes the change, assigns sub-tasks to service-specific agents, and validates that all pieces integrate correctly.

8 Registered Skills

The AGENTS.md registers 8 reusable skills: database migration, API endpoint scaffold, test generation, deployment validation, cross-service refactor, performance audit, security scan, and documentation update.

Latest Developments (2025-2026)

PROTOCOL

Google Agent2Agent (A2A) Protocol

Open protocol launched April 2025, joined the Linux Foundation in June 2025, and reached v1.0 in March 2026 with signed Agent Cards and multi-tenancy. See the dedicated A2A Protocol guide for deep coverage. Agent Cards expose capability discovery so agents can find and communicate with each other; task lifecycle management handles handoffs. By April 2026, 150+ organizations, 22k+ GitHub stars, and native support in Azure AI Foundry and AWS Bedrock AgentCore made A2A the de-facto agent interoperability standard.

ARCHITECTURE

A2A + MCP Interplay

A2A and MCP are complementary protocols that operate at different layers. A2A handles agent-to-agent communication (discovery, negotiation, task delegation between agents), while MCP handles agent-to-tool communication (connecting agents to databases, APIs, and services). A production agent system uses both: MCP for tool access and A2A for collaborating with agents from other systems.

RAG

Agentic RAG

Agentic RAG adds an autonomous reasoning layer on top of traditional retrieval-augmented generation. The agent analyzes initial retrieval results, rewrites queries if the results are insufficient, cross-references multiple sources, and iterates until it has high-confidence answers. Combines dense semantic retrieval with sparse keyword retrieval, re-ranking results with Reciprocal Rank Fusion for superior relevance.

OPERATIONS

Agent Observability

Production agent observability goes beyond logging. Step-level traces capture every reasoning step, tool call, and decision point. Outcome evaluation scores agent results against expected quality criteria. Cost and latency budgets track spend per task and flag anomalies. Tool reliability metrics identify tools that fail or return low-quality results. Safety monitoring detects policy violations and unexpected behaviors in real time.

APRIL 2026

Cursor 3: Agent Management Console

Cursor 3, released April 2, 2026, replaces the traditional IDE with an agent management console. The new Agents Window lets users run many agents in parallel across local, cloud, worktree, and remote SSH environments. Design Mode enables precise UI feedback and Agent Tabs support multitasking. Bugbot now self-improves in real time with a 78% resolution rate, and real-time reinforcement learning deploys improved Composer checkpoints as often as every five hours. Cursor crossed $2 billion ARR in February 2026.

APRIL 2026

Anthropic Advisor Pattern

Launched April 9, 2026, the advisor tool pairs a fast executor model (Sonnet 4.6 or Haiku 4.5) with a high-intelligence advisor (Opus 4.6) that provides strategic guidance mid-generation for long-horizon agentic workloads. Sonnet with an Opus advisor scored 74.8% on SWE-bench Multilingual versus 72.1% for Sonnet alone, while costing 11.9% less than running Opus solo. This pattern lets production agent systems use cheaper models by default and escalate to Opus only for complex decisions.

APRIL 2026

Claude Managed Agents

Launched April 8, 2026 in beta, Claude Managed Agents is a hosted service where Anthropic runs the agent on your behalf. Instead of building your own agent loop, managing tool execution, and provisioning infrastructure, you define the model, system prompt, tools, MCP servers, and skills — and Managed Agents spins up isolated containers per agent. Pricing is based on Claude model usage plus $0.08 per agent runtime hour. This gets teams to production agent deployments 10x faster than self-hosted infrastructure.

APRIL 2026

Claude Opus 4.7

Released April 16, 2026, Opus 4.7 is Anthropic's newest flagship model optimized for complex reasoning and long-running agent workflows. It scores 87.6% on SWE-bench Verified and 94.2% on GPQA, features a 1M token context window, 3.3x higher-resolution vision (2,576px), a new xhigh effort level for maximum quality, and task budgets for autonomous workloads. Pricing remains $5/$25 per MTok, same as Opus 4.6. Gartner projects $201.9 billion in agentic AI spending for 2026 — a 141% increase over 2025.

LANDSCAPE

Agent Framework Landscape Q2 2026

Every major AI lab now ships its own agent framework: OpenAI Agents SDK (evolved from Swarm), Google ADK, Anthropic Agent SDK, Microsoft Semantic Kernel and AutoGen, and HuggingFace Smolagents. LangGraph appeared in 34% of production architecture documents at companies with 1,000+ employees in Q1 2026 (Gartner). MCP has become the de-facto agent-to-tool communication protocol with support from OpenAI, Google, and 75+ connectors in Claude. Browser Use reached 78K GitHub stars — among the fastest-growing open-source agent tools. For most teams: LangGraph leads complex Python multi-agent orchestration, Mastra leads TypeScript agent development, and CrewAI leads rapid role-based agent prototyping.

More Guides