Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
083cedf
chore: update project agent configuration
jherr Apr 16, 2026
57a8abd
chore: update project agent configuration
jherr Apr 16, 2026
922d64d
chore: update project agent configuration
jherr Apr 16, 2026
701e128
feat: add @tanstack/ai-claude-code harness adapter and coding-agent e…
jherr Jun 13, 2026
b1627bc
feat: add @tanstack/ai-codex and @tanstack/ai-gemini-cli harness adap…
jherr Jun 13, 2026
d61590e
feat: add @tanstack/ai-opencode harness adapter
jherr Jun 13, 2026
6ceee25
Merge remote-tracking branch 'origin/main' into coding-agent-drivers
AlemTuzlak Jun 16, 2026
7f31d2e
feat(ai-sandbox): core sandbox layer — contracts, defineSandbox/ensur…
AlemTuzlak Jun 16, 2026
6863002
feat(ai-sandbox-local-process): local-process provider + exec-backed …
AlemTuzlak Jun 16, 2026
9a23fd1
feat(ai-sandbox): capability wiring into chatStream + NDJSON harness …
AlemTuzlak Jun 16, 2026
3dc49ac
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
9cafe18
fix(ai-sandbox): harden exec-backed git against argv injection + cred…
AlemTuzlak Jun 16, 2026
7e8cc54
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
127b215
feat(ai-sandbox-docker): Docker sandbox provider (dockerode)
AlemTuzlak Jun 16, 2026
e69f79a
feat(ai-claude-code): run Claude Code inside a sandbox (harness adapter)
AlemTuzlak Jun 16, 2026
3c183f2
feat(example): sandbox-coding-agent local e2e demo
AlemTuzlak Jun 16, 2026
b52be7a
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
ec7160a
feat(ai-sandbox,ai-claude-code): sandbox policy capability + claude f…
AlemTuzlak Jun 16, 2026
0891ee2
docs(sandbox): add Sandboxes overview page + ai-sandbox agent skill
AlemTuzlak Jun 16, 2026
3cbbd0c
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
d2c9a68
feat(ai-claude-code): MCP tool-proxy bridge — chat() tools into the i…
AlemTuzlak Jun 16, 2026
1a13737
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
7432bb7
feat(ai-sandbox-cloudflare): Cloudflare Containers sandbox provider
AlemTuzlak Jun 16, 2026
12e38f0
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
139e213
feat(ai-codex): run Codex inside a sandbox (harness adapter)
AlemTuzlak Jun 16, 2026
9439705
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
d828149
feat(ai-gemini-cli): run Gemini CLI inside a sandbox (harness adapter)
AlemTuzlak Jun 16, 2026
f1155f6
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
d49b8a5
feat(ai-opencode): run OpenCode inside a sandbox (harness adapter)
AlemTuzlak Jun 16, 2026
02dfe2f
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
a3c08bb
feat(ai-sandbox): hoist MCP tool-proxy to core + bridge tools in all …
AlemTuzlak Jun 16, 2026
aa65d2a
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
bff4a44
feat(ai-sandbox): interactive-approval core plumbing
AlemTuzlak Jun 16, 2026
0f6fcb4
feat(ai-sandbox): interactive approvals across all four harness adapters
AlemTuzlak Jun 16, 2026
ec2d77a
ci: apply automated fixes
autofix-ci[bot] Jun 16, 2026
4eb7a5b
feat(ai-sandbox): file-event hooks + issue-triage example
AlemTuzlak Jun 17, 2026
f82a981
ci: apply automated fixes
autofix-ci[bot] Jun 17, 2026
7f618b9
feat(ai): sandbox middleware hook types, runSandboxFile, debug catego…
AlemTuzlak Jun 17, 2026
57c89c4
test(ai): sandbox debug category, runSandboxFile dispatch, runtime emit
AlemTuzlak Jun 17, 2026
ed450bf
feat(ai): emit sandbox.file events from the chat engine via SandboxRu…
AlemTuzlak Jun 17, 2026
a634690
feat(ai-sandbox): declarative sandbox hooks via withSandbox; remove w…
AlemTuzlak Jun 17, 2026
a139f8d
docs(ai-sandbox): document declarative sandbox hooks
AlemTuzlak Jun 17, 2026
7cd520e
fix(ai-sandbox): stop watcher on abort, swallow definition-hook error…
AlemTuzlak Jun 17, 2026
57f87dd
fix(changeset): reconcile sandbox-hooks changesets to the shipped dec…
AlemTuzlak Jun 17, 2026
deb2ac1
feat(ai): cursor + chat() resume seam, shared locks capability, CUSTO…
AlemTuzlak Jun 18, 2026
282368b
refactor(ai-sandbox): use shared core locks token; feat: harness re-a…
AlemTuzlak Jun 18, 2026
56c986f
feat(persistence): @tanstack/ai-persistence + SQL core + backends + s…
AlemTuzlak Jun 18, 2026
24a088b
feat(ai-client): in-session auto-resume
AlemTuzlak Jun 18, 2026
b13bbb2
docs(persistence): overview page, ai-persistence skill, changeset, kn…
AlemTuzlak Jun 18, 2026
767e1c6
ci: apply automated fixes
autofix-ci[bot] Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .agent/self-learning/coupling.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,48 @@
"why": "Per CLAUDE.md, every feature / bug fix / behavior change MUST include E2E test coverage. When a new public capability is added, a corresponding spec under testing/e2e/tests/ plus a fixture under testing/e2e/fixtures/ are required — often plus a new entry in feature-support.ts and types.ts. The full pattern is: add the Feature flag, decide provider support, optionally add a per-feature config (system prompt, schema, tools), wire it through src/routes/api.chat.ts (or the relevant route), write the fixture(s), write the spec iterating `providersFor(feature)`. Spec must run against every supported provider so non-native-streaming providers exercise the fallback path. Skip only for refactors that don't change observable behavior."
}
]
},
{
"id": "sandbox-core-contract-touches-providers",
"trigger": "packages/ai-sandbox/src/**/*.ts",
"impacts": [
{
"target": [
"packages/ai-sandbox-local-process/src/**/*.ts",
"packages/ai-sandbox-docker/src/**/*.ts",
"packages/ai-sandbox-cloudflare/src/**/*.ts"
],
"kind": "change-required",
"why": "The SandboxProvider / SandboxHandle / SandboxCapabilities contracts in @tanstack/ai-sandbox are the seam every provider package implements. When a method, capability flag, lifecycle field, or the ensure/resume algorithm changes in core, every provider package must be updated in the same PR or it falls out of contract (silent type breaks, missing capability handling, broken resume). Also re-check capability degradation: a provider that returns capabilities().snapshots===false must keep working when core adds a snapshot-dependent path."
}
]
},
{
"id": "harness-adapter-sandbox-execution",
"trigger": "packages/ai-claude-code/src/**/*.ts",
"impacts": [
{
"target": [
"packages/ai-codex/src/**/*.ts",
"packages/ai-gemini-cli/src/**/*.ts",
"packages/ai-opencode/src/**/*.ts",
"packages/ai-sandbox/src/**/*.ts"
],
"kind": "change-required",
"why": "All four harness adapters share one execution contract: declare requires:[SandboxCapability], spawn the agent CLI via sandbox.process (never local child_process), pipe its native stream-json/ACP stdout through the per-adapter translate layer, and proxy host tools via the MCP-over-channel bridge. When the sandbox-execution pattern, the host MCP tool-bridge shape, the per-run bearer-token/channel contract, or the policy->native-permission mapping changes in one adapter (or in @tanstack/ai-sandbox), mirror it across the other harness adapters so they don't diverge into incompatible execution paths."
}
]
},
{
"id": "sandbox-source-persistence-ready",
"trigger": "packages/ai-sandbox*/src/**/*.ts",
"impacts": [
{
"target": ["packages/ai-sandbox/src/**/*.ts"],
"kind": "change-required",
"why": "The sandbox layer ships with zero persistence package but MUST stay persistence-ready so the persistence proposal drops in without re-architecture. Invariant to preserve on any sandbox change: SandboxStore and LockStore stay PLUGGABLE optional capabilities (in-memory defaults only - never hardcode storage), emitted chunks stay conceptually offset-addressable ({runId, seq, ts, chunk}) so a future EventLog/DurableRunStream can capture+replay by cursor, and approvals keep using the existing resume-based approval-requested flow. Do not introduce a sandbox-owned durable store, a bespoke replay buffer, or a non-AG-UI event type that the persistence layer would later have to rip out."
}
]
}
]
}
4 changes: 4 additions & 0 deletions .agentsroom/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# AgentsRoom: personal files (not committed to git)
*-personal.json
agents-local.json
sessions/
10 changes: 10 additions & 0 deletions .agentsroom/agents.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[
{
"role": "fullstack",
"model": "opus",
"customName": "Full-Stack Developer",
"isPersonal": false,
"id": "agent-1776361243376-3sekdc",
"claudeSessionId": "96773a93-be2a-45a9-a732-ceb224d3d0e5"
}
]
4 changes: 4 additions & 0 deletions .agentsroom/prompts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"folders": [],
"prompts": []
}
5 changes: 5 additions & 0 deletions .changeset/ai-claude-code-initial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-claude-code': minor
---

New `@tanstack/ai-claude-code` package: a Claude Code **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns the `claude` CLI (`claude -p --output-format stream-json`) inside the sandbox provided by `withSandbox(...)`, streaming its events back as AG-UI chunks. Claude Code owns the agent loop and executes its own native tools (bash, file edits, search) against the sandbox workspace; their activity streams back as resolved tool-call events. `chat()`-provided server tools are bridged to the in-sandbox agent over a host-side MCP tool-proxy (calls are proxied back to the host where `execute()` runs). Sessions are resumable via `modelOptions.sessionId` (surfaced through a `claude-code.session-id` custom event), and the working-tree diff is emitted as a `file.changed` custom event after each run. A `defineSandboxPolicy` (allow/ask/deny command globs + file-write/network capability rules) is enforced via Claude Code's `--permission-prompt-tool`: each native tool use is checked against the policy and the client's approval decisions, and an `ask` action with no decision yet surfaces an `approval-requested` event (the client approves and re-runs to continue). Requires the `claude` executable and `ANTHROPIC_API_KEY` to be available in the sandbox (e.g. via `workspace.secrets`).
5 changes: 5 additions & 0 deletions .changeset/ai-codex-initial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-codex': minor
---

New `@tanstack/ai-codex` package: a Codex **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns `codex exec --experimental-json` inside the sandbox provided by `withSandbox(...)` (mirroring `@openai/codex-sdk`'s own CLI invocation), feeding the prompt via stdin and streaming its JSONL thread events back as AG-UI chunks. Codex owns the agent loop and executes its built-in tools (shell, file changes, web search, todo lists) against the sandbox workspace. Threads are resumable via `modelOptions.sessionId` (surfaced through a `codex.session-id` custom event); sandbox mode / approval policy / reasoning effort map to codex CLI flags. Requires the `codex` executable and `CODEX_API_KEY` (or a `codex login`) in the sandbox. chat()-provided server tools are bridged into the agent via the host MCP tool-proxy. A `defineSandboxPolicy` is mapped onto Codex's coarse permission knobs (sandbox mode, `approval_policy`, `network_access`); because `codex exec` runs non-interactively with no per-action host callback, the fine-grained resume-based interactive-approval flow is not available for Codex (it refuses, rather than prompts for, actions needing approval).
5 changes: 5 additions & 0 deletions .changeset/ai-gemini-cli-initial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-gemini-cli': minor
---

New `@tanstack/ai-gemini-cli` package: a Gemini CLI **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns `gemini --acp` (Agent Client Protocol) inside the sandbox provided by `withSandbox(...)`, driving it over the sandbox's duplex process IO (the ACP transport is adapted from the sandbox `SpawnHandle`; all ACP protocol handling is reused). Gemini CLI owns the agent loop and executes its built-in tools (shell, file edits, search) against the sandbox workspace; assistant text/thinking stream as token-level deltas and tool activity as resolved tool-call events. Sessions are resumable via `modelOptions.sessionId` (surfaced through a `gemini-cli.session-id` custom event, with graceful fallback to transcript replay), and ACP permission requests are answered by a configurable never-hanging policy (`default` / `acceptEdits` / `bypassPermissions` or a custom handler), and an action the policy would reject with no client decision yet surfaces an `approval-requested` event so the client can approve and re-run to grant it (interactive approvals). Headless auth is selectable up front via `authMethodId`. Requires the `gemini` CLI in the sandbox. chat()-provided server tools are bridged into the agent via the host MCP tool-proxy.
5 changes: 5 additions & 0 deletions .changeset/ai-opencode-initial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-opencode': minor
---

New `@tanstack/ai-opencode` package: an OpenCode **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]`, spawns `opencode serve` inside the sandbox provided by `withSandbox(...)`, exposes its port, and connects the `@opencode-ai/sdk` HTTP client to it via `baseUrl`. OpenCode owns the agent loop and executes its built-in tools (shell, file edits, search) against the sandbox workspace; assistant text/thinking stream as token-level deltas and tool activity as resolved tool-call events. Sessions are resumable, and OpenCode permission requests are answered by a configurable `permissionMode` (`default` / `acceptEdits` / `bypassPermissions` or a custom handler), and a request the policy would reject with no client decision yet surfaces an `approval-requested` event so the client can approve and re-run to grant it (interactive approvals). Requires the `opencode` CLI in the sandbox (Docker: publish the server port via `publishPorts`). chat()-provided server tools are bridged into the agent via the host MCP tool-proxy.
5 changes: 5 additions & 0 deletions .changeset/ai-sandbox-cloudflare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-sandbox-cloudflare': minor
---

New `@tanstack/ai-sandbox-cloudflare` package: a Cloudflare Containers sandbox provider (`cloudflareSandbox`) built on `@cloudflare/sandbox`, for running harness adapters at the edge inside a Worker. Implements the uniform `SandboxHandle` (exec, base64-backed fs, git, `exposePort` preview URLs, env) over the Cloudflare Sandbox Durable Object. The container disk is ephemeral and snapshots are not yet GA, so `withSandbox` re-bootstraps under the same identity across cold starts (`durableFilesystem`/`snapshots` are reported false). Background processes don't expose stdin on Cloudflare, so stdin-fed harnesses (e.g. Claude Code) need a stdin-capable provider; `exec` works fully.
31 changes: 31 additions & 0 deletions .changeset/persistence-layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
'@tanstack/ai': minor
'@tanstack/ai-sandbox': patch
'@tanstack/ai-client': minor
'@tanstack/ai-claude-code': patch
'@tanstack/ai-codex': patch
'@tanstack/ai-gemini-cli': patch
'@tanstack/ai-opencode': patch
'@tanstack/ai-persistence': minor
'@tanstack/ai-persistence-sql': minor
'@tanstack/ai-persistence-sqlite': minor
'@tanstack/ai-persistence-postgres': minor
'@tanstack/ai-persistence-cloudflare': minor
'@tanstack/ai-persistence-drizzle': minor
'@tanstack/ai-persistence-prisma': minor
'@tanstack/ai-sandbox-persistence': minor
---

Persistence + resumable runs as composable `chat()` middleware.

`withPersistence(...)` makes any run durable: it loads/saves thread message history (server-authoritative), creates/updates run records, persists every AG-UI `StreamChunk` to an append-only event log, and persists usage. It is fully **optional** — a `chat()` with no persistence middleware is byte-for-byte unchanged, and it works for both non-sandbox and sandbox (agent-mode) runs.

**Resume.** Each persisted chunk carries an in-band, opaque `cursor` (a monotonic per-run sequence). A client that disconnects mid-run reconnects with the run's `runId` + last `cursor`; `chat({ cursor })` replays the persisted event tail after that cursor, then — for harness adapters that re-attach to their still-running in-sandbox process — continues live. The headless `ChatClient` tracks the cursor and exposes `resume()` / `getResumeState()` / `maybeAutoResume()` with an `autoResume` opt-out.

**Event model.** The persisted log is the AG-UI `StreamChunk` stream itself (no parallel event type); agent activity (file changes, process output, approvals, artifacts, sandbox lifecycle) rides on well-known `CUSTOM` events catalogued in `@tanstack/ai`.

**Backends (shared SQL core + thin adapters).** One SQL implementation behind a minimal `SqlDriver` (`@tanstack/ai-persistence-sql`), with backends for SQLite (`-sqlite`, node:sqlite/better-sqlite3), Postgres (`-postgres`, pg), Cloudflare D1 (`-cloudflare`), and bring-your-own Drizzle (`-drizzle`) and Prisma (`-prisma`). Raw drivers auto-migrate (versioned, opt-out); ORMs own their schema. `memoryPersistence()` ships in core for tests/examples.

**Agent mode.** `@tanstack/ai-sandbox-persistence` bridges a durable SQL-backed `SandboxStore` and the durable `LockStore` into `withSandbox`, so sandbox resume and ensure-locking survive across processes. The shared `locks` capability now lives in `@tanstack/ai` (one token across the sandbox and persistence layers); `@tanstack/ai-sandbox` re-exports it for back-compat.

Approvals are persisted and a durable approval controller feeds decisions back into the existing deny-and-replay flow. Cloudflare is compile-verified (Workers runtime), Postgres runtime-verification is via Docker, and live harness re-attach is verified with the real CLIs; everything else is unit/integration-tested. The Playwright E2E suite is a follow-up.
23 changes: 23 additions & 0 deletions .changeset/sandbox-hooks-redesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
'@tanstack/ai': minor
'@tanstack/ai-sandbox': minor
'@tanstack/ai-sandbox-local-process': minor
---

Declarative sandbox file-event hooks: observe file create / change / delete
inside a sandbox and have them fire automatically during a chat run.

- `@tanstack/ai`: chat middleware gains an optional `sandbox` hook group
(`onFile`/`onFileCreate`/`onFileChange`/`onFileDelete`), a `SandboxFileEvent`
type, and a `sandbox` debug-logging category. The engine auto-emits a
`CUSTOM` `sandbox.file` event per change (client reads it from `parts`).
- `@tanstack/ai-sandbox`: `defineSandbox({ hooks, fileEvents })` declares
file + lifecycle hooks (`onFile*`/`onReady`/`onError`/`onDestroy`) that fire
automatically while the sandbox runs in a chat — `withSandbox` owns the
watcher. The watcher is provider-agnostic: a native `fs.watch` fast-path when
the provider advertises it, otherwise a portable `find -printf` mtime
snapshot-diff poll (no extra deps; `.git`/`node_modules` ignored by default).
`watchWorkspace()` / `diffSnapshots` remain as low-level building blocks.
- `@tanstack/ai-sandbox-local-process`: implements the optional `fs.watch` seam
via Node's recursive `fs.watch` (Windows/macOS); Linux falls back to the core
exec-poll automatically.
13 changes: 13 additions & 0 deletions .changeset/sandbox-layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
'@tanstack/ai-sandbox': minor
'@tanstack/ai-sandbox-local-process': minor
'@tanstack/ai-sandbox-docker': minor
'@tanstack/ai': minor
---

New provider-agnostic sandbox layer so harness adapters can run **inside** isolated sandboxes.

- **`@tanstack/ai-sandbox`** — `defineSandbox()` (lazy controller + resume→restoreSnapshot→create+bootstrap ensure algorithm), `withSandbox()` middleware, `defineWorkspace()` (git/local source, package-manager detection, setup, skills, secrets), `defineSandboxPolicy()`, the `SandboxProvider`/`SandboxHandle`/`SandboxCapabilities` contracts, capability tokens (`SandboxCapability` plus the optional `SandboxStore`/`Locks` persistence seams with in-memory defaults), `bootstrapWorkspace`, `createExecBackedGit`, `spawnNdjson` (run an agent CLI in a sandbox and stream its NDJSON stdout), the host MCP tool-proxy bridge (`startHostToolBridge` — exposes `chat()` server tools to the in-sandbox agent, with an optional permission-prompt tool), and the shared interactive-approval primitives (`resolveApproval`, `approvalId`, `buildApprovalRequestedEvent`) harness adapters use to enforce a policy and surface `approval-requested` events for client-in-the-loop approvals.
- **`@tanstack/ai-sandbox-local-process`** — `localProcessSandbox()`: runs the agent on the host through the uniform `SandboxHandle` (no isolation; the fast dev loop).
- **`@tanstack/ai-sandbox-docker`** — `dockerSandbox()`: runs the agent inside an isolated Docker container (dockerode), with commit-based snapshots, fork, and resume-by-id.
- **`@tanstack/ai`** — `TextOptions.capabilities` exposes the middleware capability context to adapters so harness adapters that declare `requires: [...]` can read provided capabilities from `chatStream`; `TextOptions.approvals` threads client approval decisions through to adapters for the interactive-approval (deny + `approval-requested` + re-run) flow; `DefinedChatMiddleware` and `AnyChatMiddleware` are now exported for portable middleware authoring.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,7 @@ solo.yml
# Agent scratch output (gap-analysis reports, triage notes — generated locally)
.agent/gap-analysis/
.agent/triage/

/OpenCode.md
.agentsroom/
.opencode/
Loading
Loading