Skip to content

Commit 2d665c9

Browse files
sdcoffeyalfozanadityasingh2400andi-oaiaron-cf
authored
Sandbox Agents (#2889)
### Sandbox Agents This release adds **Sandbox Agents**, a beta SDK surface for running agents with a persistent, isolated workspace. Sandbox agents keep the normal `Agent` and `Runner` flow, but add workspace manifests, sandbox-native capabilities, sandbox clients, snapshots, and resume support so agents can work over real files, run commands, edit repositories, generate artifacts, and continue work across runs. Key pieces: - `SandboxAgent`: an `Agent` with sandbox defaults such as `default_manifest`, sandbox instructions, capabilities, and `run_as`. - `Manifest`: a fresh-workspace contract for files, directories, local files, local directories, Git repos, environment, users, groups, and mounts. - `SandboxRunConfig`: per-run sandbox wiring for client creation, live session injection, serialized session resume, manifest overrides, snapshots, and materialization concurrency limits. - Built-in capabilities for shell access, filesystem editing and image inspection, skills, memory, and compaction. - Workspace snapshots and serialized sandbox session state for reconnecting to existing work or seeding a fresh sandbox from saved contents. ### Sandbox clients and hosted providers Sandbox agents now support local, containerized, and hosted execution backends: - `UnixLocalSandboxClient` for fast local development. - `DockerSandboxClient` for container isolation and image parity. - Hosted sandbox clients for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel through optional extras. The release also adds provider-specific examples and mount strategies for common storage backends, including S3, Cloudflare R2, Google Cloud Storage, Azure Blob Storage, and S3 Files where supported by the selected backend. ### Sandbox memory Adds a sandbox memory capability that lets future sandbox-agent runs learn from prior runs. Memory stores extracted lessons in the sandbox workspace, injects a concise summary into later runs, and uses progressive disclosure so agents can search deeper rollout summaries only when useful. Memory supports: - Read-only or generate-only modes. - Live updates when the agent discovers stale memory. - Multi-turn grouping through `conversation_id`, SDK `Session`, `RunConfig.group_id`, or generated run IDs. - Separate memory layouts for isolating memory across agents or workflows. - S3-backed examples for persisted memory across runs. ### Workspace mounts, snapshots, and resume This release adds a full workspace entry and mount model for sandbox sessions: - Local files and directories. - Synthetic files and directories. - Git repository entries. - Remote storage mounts for S3, R2, GCS, Azure Blob Storage, and S3 Files. - Provider-specific mount strategies across Docker, Modal, Cloudflare, Blaxel, Daytona, E2B, and Runloop. - Portable snapshots with path normalization, symlink preservation, mount-safe snapshotting, and remote snapshot support. - Resume paths through runner-managed `RunState`, explicit `SandboxSessionState`, or saved snapshots. ### Examples and tutorials Adds a large `examples/sandbox/` suite covering: - Local Unix and Docker sandbox runners. - Docker mount smoke tests for S3, GCS, Azure Blob Storage, and S3 Files. - Sandbox coding tasks with skills. - Sandbox agents as tools and handoff patterns. - Memory examples, including multi-agent/multi-turn memory and S3-backed memory. - Tax-prep and healthcare-support workflows. - Dataroom QA and metric extraction tutorials. - Repository code review tutorial. - Vision website clone tutorial. - Provider examples for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, Temporal, and Vercel. ### Runtime, tracing, and model plumbing The release includes the runtime plumbing needed to make sandbox agents work naturally inside the existing SDK: - Runner-managed sandbox preparation, capability binding, session lifecycle, state serialization, and resume behavior. - Sandbox-aware `RunState` serialization. - Unified sandbox tracing with SDK spans. - Token usage on tracing spans. - Runner-managed prompt cache key defaults. - OpenAI agent registration and harness ID configuration. - Safer redaction of sensitive MCP tool outputs when sensitive tracing is disabled. - Additional OpenAI client/model utilities and Chat Completions coverage. ## Documentation & Other Changes - docs: add Asqav to external tracing processors list. - docs: update translated document pages. Co-authored-by: Abdulrahman Alfozan <alfozan@openai.com> Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com> Co-authored-by: Andi Liu <andi@openai.com> Co-authored-by: Aron <263346377+aron-cf@users.noreply.github.com> Co-authored-by: ashwinnathan-openai <ashwinnathan@openai.com> Co-authored-by: Codex <noreply@openai.com> Co-authored-by: cploujoux <cploujoux@blaxel.ai> Co-authored-by: elainegan-openai <168589666+elainegan-openai@users.noreply.github.com> Co-authored-by: Elias Freider <freider@users.noreply.github.com> Co-authored-by: Erik Dunteman <erik@erikds-macbook-air.local> Co-authored-by: Jason Liu <jasonliu@openai.com> Co-authored-by: Jason Steving <32336750+jasonsteving99@users.noreply.github.com> Co-authored-by: Kazuhiro Sera <seratch@openai.com> Co-authored-by: Lovre Pešut <lovre.pesut@gmail.com> Co-authored-by: Lucas Wang <lucas_wang@lucas-futures.com> Co-authored-by: Matt Brockman <matt.brockman@e2b.dev> Co-authored-by: Mish Ushakov <mishushakov@users.noreply.github.com> Co-authored-by: Naresh <ghostwriternr@gmail.com> Co-authored-by: nicholasclark-openai <nicholasclark@openai.com> Co-authored-by: qiyaoq-oai <qiyaoq@openai.com> Co-authored-by: Scott Trinh <scott@scotttrinh.com> Co-authored-by: tode-rl <tony@runloop.ai> Co-authored-by: Wendy Jiao <wendyjiao@openai.com>
1 parent 86739b1 commit 2d665c9

File tree

459 files changed

+95144
-1523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

459 files changed

+95144
-1523
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ A clear and concise description of what the bug is.
1717

1818
### Debug information
1919
- Agents SDK version: (e.g. `v0.0.3`)
20-
- Python version (e.g. Python 3.10)
20+
- Python version (e.g. Python 3.14)
2121

2222
### Repro steps
2323

.github/ISSUE_TEMPLATE/model_provider.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ A clear and concise description of what the question or bug is.
1717

1818
### Debug information
1919
- Agents SDK version: (e.g. `v0.0.3`)
20-
- Python version (e.g. Python 3.10)
20+
- Python version (e.g. Python 3.14)
2121

2222
### Repro steps
2323
Ideally provide a minimal python script that can be run to reproduce the issue.

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ The OpenAI Agents Python repository provides the Python Agents SDK, examples, an
9191
- `src/agents/run_state.py` (RunState serialization/deserialization)
9292
- `src/agents/run_internal/session_persistence.py` (session save/rewind)
9393
- If the serialized RunState shape changes, update `CURRENT_SCHEMA_VERSION` in `src/agents/run_state.py` and the related serialization/deserialization logic. Keep released schema versions readable, and feel free to renumber or squash unreleased schema versions before release when those intermediate snapshots are intentionally unsupported.
94+
- When bumping `CURRENT_SCHEMA_VERSION`, also add or update the matching entry in `SCHEMA_VERSION_SUMMARIES` in `src/agents/run_state.py` so every supported version keeps a short historical note describing what changed in that schema.
9495

9596
## Operation Guide
9697

CLAUDE.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

README.md

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ The OpenAI Agents SDK is a lightweight yet powerful framework for building multi
1010
### Core concepts:
1111

1212
1. [**Agents**](https://openai.github.io/openai-agents-python/agents): LLMs configured with instructions, tools, guardrails, and handoffs
13+
1. [**Sandbox Agents**](https://openai.github.io/openai-agents-python/sandbox_agents): Agents preconfigured to work with a container to perform work over long time horizons.
1314
1. **[Agents as tools](https://openai.github.io/openai-agents-python/tools/#agents-as-tools) / [Handoffs](https://openai.github.io/openai-agents-python/handoffs/)**: Delegating to other agents for specific tasks
1415
1. [**Tools**](https://openai.github.io/openai-agents-python/tools/): Various Tools let agents take actions (functions, MCP, hosted tools)
1516
1. [**Guardrails**](https://openai.github.io/openai-agents-python/guardrails/): Configurable safety checks for input and output validation
@@ -45,19 +46,36 @@ uv add openai-agents
4546

4647
For voice support, install with the optional `voice` group: `uv add 'openai-agents[voice]'`. For Redis session support, install with the optional `redis` group: `uv add 'openai-agents[redis]'`.
4748

48-
## Run your first agent
49+
## Run your first Sandbox Agent
4950

50-
```python
51-
from agents import Agent, Runner
52-
53-
agent = Agent(name="Assistant", instructions="You are a helpful assistant")
51+
[Sandbox Agents](https://openai.github.io/openai-agents-python/sandbox_agents) are new in version 0.14.0. A sandbox agent is an agent that uses a computer environment to perform real work with a filesystem, in an environment you configure and control. Sandbox agents are useful when the agent needs to inspect files, run commands, apply patches, or carry workspace state across longer tasks.
5452

55-
result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")
53+
```python
54+
from agents import Runner
55+
from agents.run import RunConfig
56+
from agents.sandbox import Manifest, SandboxAgent, SandboxRunConfig
57+
from agents.sandbox.entries import GitRepo
58+
from agents.sandbox.sandboxes import UnixLocalSandboxClient
59+
60+
agent = SandboxAgent(
61+
name="Workspace Assistant",
62+
instructions="Inspect the sandbox workspace before answering.",
63+
default_manifest=Manifest(
64+
entries={
65+
"repo": GitRepo(repo="openai/openai-agents-python", ref="main"),
66+
}
67+
),
68+
)
69+
70+
result = Runner.run_sync(
71+
agent,
72+
"Inspect the repo README and summarize what this project does.",
73+
# Run this agent on the local filesystem
74+
run_config=RunConfig(sandbox=SandboxRunConfig(client=UnixLocalSandboxClient())),
75+
)
5676
print(result.final_output)
5777

58-
# Code within the code,
59-
# Functions calling themselves,
60-
# Infinite loop's dance.
78+
# This project provides a Python SDK for building multi-agent workflows.
6179
```
6280

6381
(_If running this, ensure you set the `OPENAI_API_KEY` environment variable_)
@@ -88,4 +106,4 @@ We also rely on the following tools to manage the project:
88106
- [pytest](https://github.com/pytest-dev/pytest) and [Coverage.py](https://github.com/coveragepy/coveragepy)
89107
- [MkDocs](https://github.com/squidfunk/mkdocs-material)
90108

91-
We're committed to continuing to build the Agents SDK as an open source framework so others in the community can expand on our approach.
109+
We're committed to continuing to build the Agents SDK as an open source framework so others in the community can expand on our approach.

docs/agents.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22

33
Agents are the core building block in your apps. An agent is a large language model (LLM) configured with instructions, tools, and optional runtime behavior such as handoffs, guardrails, and structured outputs.
44

5-
Use this page when you want to define or customize a single agent. If you are deciding how multiple agents should collaborate, read [Agent orchestration](multi_agent.md).
5+
Use this page when you want to define or customize a single plain `Agent`. If you are deciding how multiple agents should collaborate, read [Agent orchestration](multi_agent.md). If the agent should run inside an isolated workspace with manifest-defined files and sandbox-native capabilities, read [Sandbox agent concepts](sandbox/guide.md).
6+
7+
The SDK uses the Responses API by default for OpenAI models, but the distinction here is orchestration: `Agent` plus `Runner` lets the SDK manage turns, tools, guardrails, handoffs, and sessions for you. If you want to own that loop yourself, use the Responses API directly instead.
68

79
## Choose the next guide
810

@@ -12,6 +14,7 @@ Use this page as the hub for agent definition. Jump to the adjacent guide that m
1214
| --- | --- |
1315
| Choose a model or provider setup | [Models](models/index.md) |
1416
| Add capabilities to the agent | [Tools](tools.md) |
17+
| Run an agent against a real repo, document bundle, or isolated workspace | [Sandbox agents quickstart](sandbox_agents.md) |
1518
| Decide between manager-style orchestration and handoffs | [Agent orchestration](multi_agent.md) |
1619
| Configure handoff behavior | [Handoffs](handoffs.md) |
1720
| Run turns, stream events, or manage conversation state | [Running agents](running_agents.md) |
@@ -57,6 +60,8 @@ agent = Agent(
5760
)
5861
```
5962

63+
Everything in this section applies to `Agent`. `SandboxAgent` builds on the same ideas, then adds `default_manifest`, `base_instructions`, `capabilities`, and `run_as` for workspace-scoped runs. See [Sandbox agent concepts](sandbox/guide.md).
64+
6065
## Prompt templates
6166

6267
You can reference a prompt template created in the OpenAI platform by setting `prompt`. This works with OpenAI models using the Responses API.
84.2 KB
Loading

docs/config.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,13 @@
22

33
This page covers SDK-wide defaults that you usually set once during application startup, such as the default OpenAI key or client, the default OpenAI API shape, tracing export defaults, and logging behavior.
44

5+
These defaults still apply to sandbox-based workflows, but sandbox workspaces, sandbox clients, and session reuse are configured separately.
6+
57
If you need to configure a specific agent or run instead, start with:
68

9+
- [Agents](agents.md) for instructions, tools, output types, handoffs, and guardrails on a plain `Agent`.
710
- [Running agents](running_agents.md) for `RunConfig`, sessions, and conversation-state options.
11+
- [Sandbox agents](sandbox/guide.md) for `SandboxRunConfig`, manifests, capabilities, and sandbox-client-specific workspace setup.
812
- [Models](models/index.md) for model selection and provider configuration.
913
- [Tracing](tracing.md) for per-run tracing metadata and custom trace processors.
1014

docs/index.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Here are the main features of the SDK:
2020
- **Agent loop**: A built-in agent loop that handles tool invocation, sends results back to the LLM, and continues until the task is complete.
2121
- **Python-first**: Use built-in language features to orchestrate and chain agents, rather than needing to learn new abstractions.
2222
- **Agents as tools / Handoffs**: A powerful mechanism for coordinating and delegating work across multiple agents.
23+
- **Sandbox agents**: Run specialists inside real isolated workspaces with manifest-defined files, sandbox client choice, and resumable sandbox sessions.
2324
- **Guardrails**: Run input validation and safety checks in parallel with agent execution, and fail fast when checks do not pass.
2425
- **Function tools**: Turn any Python function into a tool with automatic schema generation and Pydantic-powered validation.
2526
- **MCP server tool calling**: Built-in MCP server tool integration that works the same way as function tools.
@@ -28,6 +29,23 @@ Here are the main features of the SDK:
2829
- **Tracing**: Built-in tracing for visualizing, debugging, and monitoring workflows, with support for the OpenAI suite of evaluation, fine-tuning, and distillation tools.
2930
- **Realtime Agents**: Build powerful voice agents with `gpt-realtime-1.5`, automatic interruption detection, context management, guardrails, and more.
3031

32+
## Agents SDK or Responses API?
33+
34+
The SDK uses the Responses API by default for OpenAI models, but it adds a higher-level runtime around model calls.
35+
36+
Use the Responses API directly when:
37+
38+
- you want to own the loop, tool dispatch, and state handling yourself
39+
- your workflow is short-lived and mainly about returning the model's response
40+
41+
Use the Agents SDK when:
42+
43+
- you want the runtime to manage turns, tool execution, guardrails, handoffs, or sessions
44+
- your agent should produce artifacts or operate across multiple coordinated steps
45+
- you need a real workspace or resumable execution through [Sandbox agents](sandbox_agents.md)
46+
47+
You do not need to choose one globally. Many applications use the SDK for managed workflows and call the Responses API directly for lower-level paths.
48+
3149
## Installation
3250

3351
```bash
@@ -59,6 +77,7 @@ export OPENAI_API_KEY=sk-...
5977

6078
- Build your first text-based agent with the [Quickstart](quickstart.md).
6179
- Then decide how you want to carry state across turns in [Running agents](running_agents.md#choose-a-memory-strategy).
80+
- If the task depends on real files, repos, or isolated per-agent workspace state, read the [Sandbox agents quickstart](sandbox_agents.md).
6281
- If you are deciding between handoffs and manager-style orchestration, read [Agent orchestration](multi_agent.md).
6382

6483
## Choose your path
@@ -69,6 +88,7 @@ Use this table when you know the job you want to do, but not which page explains
6988
| --- | --- |
7089
| Build the first text agent and see one complete run | [Quickstart](quickstart.md) |
7190
| Add function tools, hosted tools, or agents as tools | [Tools](tools.md) |
91+
| Run a coding, review, or document agent inside a real isolated workspace | [Sandbox agents quickstart](sandbox_agents.md) and [Sandbox clients](sandbox/clients.md) |
7292
| Decide between handoffs and manager-style orchestration | [Agent orchestration](multi_agent.md) |
7393
| Keep memory across turns | [Running agents](running_agents.md#choose-a-memory-strategy) and [Sessions](sessions/index.md) |
7494
| Use OpenAI models, websocket transport, or non-OpenAI providers | [Models](models/index.md) |

0 commit comments

Comments
 (0)