TanStack · AlemTuzlak · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026 · Jun 13, 2026
diff --git a/.agent/self-learning/coupling.json b/.agent/self-learning/coupling.json
@@ -61,6 +61,48 @@
           "why": "Per CLAUDE.md, every feature / bug fix / behavior change MUST include E2E test coverage. When a new public capability is added, a corresponding spec under testing/e2e/tests/ plus a fixture under testing/e2e/fixtures/ are required — often plus a new entry in feature-support.ts and types.ts. The full pattern is: add the Feature flag, decide provider support, optionally add a per-feature config (system prompt, schema, tools), wire it through src/routes/api.chat.ts (or the relevant route), write the fixture(s), write the spec iterating `providersFor(feature)`. Spec must run against every supported provider so non-native-streaming providers exercise the fallback path. Skip only for refactors that don't change observable behavior."
         }
       ]
+    },
+    {
+      "id": "sandbox-core-contract-touches-providers",
+      "trigger": "packages/ai-sandbox/src/**/*.ts",
+      "impacts": [
+        {
+          "target": [
+            "packages/ai-sandbox-local-process/src/**/*.ts",
+            "packages/ai-sandbox-docker/src/**/*.ts",
+            "packages/ai-sandbox-cloudflare/src/**/*.ts"
+          ],
+          "kind": "change-required",
+          "why": "The SandboxProvider / SandboxHandle / SandboxCapabilities contracts in @tanstack/ai-sandbox are the seam every provider package implements. When a method, capability flag, lifecycle field, or the ensure/resume algorithm changes in core, every provider package must be updated in the same PR or it falls out of contract (silent type breaks, missing capability handling, broken resume). Also re-check capability degradation: a provider that returns capabilities().snapshots===false must keep working when core adds a snapshot-dependent path."
+        }
+      ]
+    },
+    {
+      "id": "harness-adapter-sandbox-execution",
+      "trigger": "packages/ai-claude-code/src/**/*.ts",
+      "impacts": [
+        {
+          "target": [
+            "packages/ai-codex/src/**/*.ts",
+            "packages/ai-gemini-cli/src/**/*.ts",
+            "packages/ai-opencode/src/**/*.ts",
+            "packages/ai-sandbox/src/**/*.ts"
+          ],
+          "kind": "change-required",
+          "why": "All four harness adapters share one execution contract: declare requires:[SandboxCapability], spawn the agent CLI via sandbox.process (never local child_process), pipe its native stream-json/ACP stdout through the per-adapter translate layer, and proxy host tools via the MCP-over-channel bridge. When the sandbox-execution pattern, the host MCP tool-bridge shape, the per-run bearer-token/channel contract, or the policy->native-permission mapping changes in one adapter (or in @tanstack/ai-sandbox), mirror it across the other harness adapters so they don't diverge into incompatible execution paths."
+        }
+      ]
+    },
+    {
+      "id": "sandbox-source-persistence-ready",
+      "trigger": "packages/ai-sandbox*/src/**/*.ts",
+      "impacts": [
+        {
+          "target": ["packages/ai-sandbox/src/**/*.ts"],
+          "kind": "change-required",
+          "why": "The sandbox layer ships with zero persistence package but MUST stay persistence-ready so the persistence proposal drops in without re-architecture. Invariant to preserve on any sandbox change: SandboxStore and LockStore stay PLUGGABLE optional capabilities (in-memory defaults only - never hardcode storage), emitted chunks stay conceptually offset-addressable ({runId, seq, ts, chunk}) so a future EventLog/DurableRunStream can capture+replay by cursor, and approvals keep using the existing resume-based approval-requested flow. Do not introduce a sandbox-owned durable store, a bespoke replay buffer, or a non-AG-UI event type that the persistence layer would later have to rip out."
+        }
+      ]
     }
   ]
 }
diff --git a/.agentsroom/.gitignore b/.agentsroom/.gitignore
@@ -0,0 +1,4 @@
+# AgentsRoom: personal files (not committed to git)
+*-personal.json
+agents-local.json
+sessions/
diff --git a/.agentsroom/agents.json b/.agentsroom/agents.json
@@ -0,0 +1,10 @@
+[
+  {
+    "role": "fullstack",
+    "model": "opus",
+    "customName": "Full-Stack Developer",
+    "isPersonal": false,
+    "id": "agent-1776361243376-3sekdc",
+    "claudeSessionId": "96773a93-be2a-45a9-a732-ceb224d3d0e5"
+  }
+]
diff --git a/.agentsroom/prompts.json b/.agentsroom/prompts.json
@@ -0,0 +1,4 @@
+{
+  "folders": [],
+  "prompts": []
+}
diff --git a/.changeset/ai-claude-code-initial.md b/.changeset/ai-claude-code-initial.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-claude-code': minor
+---
+
+New `@tanstack/ai-claude-code` package: a Claude Code **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns the `claude` CLI (`claude -p --output-format stream-json`) inside the sandbox provided by `withSandbox(...)`, streaming its events back as AG-UI chunks. Claude Code owns the agent loop and executes its own native tools (bash, file edits, search) against the sandbox workspace; their activity streams back as resolved tool-call events. `chat()`-provided server tools are bridged to the in-sandbox agent over a host-side MCP tool-proxy (calls are proxied back to the host where `execute()` runs). Sessions are resumable via `modelOptions.sessionId` (surfaced through a `claude-code.session-id` custom event), and the working-tree diff is emitted as a `file.changed` custom event after each run. A `defineSandboxPolicy` (allow/ask/deny command globs + file-write/network capability rules) is enforced via Claude Code's `--permission-prompt-tool`: each native tool use is checked against the policy and the client's approval decisions, and an `ask` action with no decision yet surfaces an `approval-requested` event (the client approves and re-runs to continue). Requires the `claude` executable and `ANTHROPIC_API_KEY` to be available in the sandbox (e.g. via `workspace.secrets`).
diff --git a/.changeset/ai-codex-initial.md b/.changeset/ai-codex-initial.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-codex': minor
+---
+
+New `@tanstack/ai-codex` package: a Codex **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns `codex exec --experimental-json` inside the sandbox provided by `withSandbox(...)` (mirroring `@openai/codex-sdk`'s own CLI invocation), feeding the prompt via stdin and streaming its JSONL thread events back as AG-UI chunks. Codex owns the agent loop and executes its built-in tools (shell, file changes, web search, todo lists) against the sandbox workspace. Threads are resumable via `modelOptions.sessionId` (surfaced through a `codex.session-id` custom event); sandbox mode / approval policy / reasoning effort map to codex CLI flags. Requires the `codex` executable and `CODEX_API_KEY` (or a `codex login`) in the sandbox. chat()-provided server tools are bridged into the agent via the host MCP tool-proxy. A `defineSandboxPolicy` is mapped onto Codex's coarse permission knobs (sandbox mode, `approval_policy`, `network_access`); because `codex exec` runs non-interactively with no per-action host callback, the fine-grained resume-based interactive-approval flow is not available for Codex (it refuses, rather than prompts for, actions needing approval).
diff --git a/.changeset/ai-gemini-cli-initial.md b/.changeset/ai-gemini-cli-initial.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-gemini-cli': minor
+---
+
+New `@tanstack/ai-gemini-cli` package: a Gemini CLI **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]` and spawns `gemini --acp` (Agent Client Protocol) inside the sandbox provided by `withSandbox(...)`, driving it over the sandbox's duplex process IO (the ACP transport is adapted from the sandbox `SpawnHandle`; all ACP protocol handling is reused). Gemini CLI owns the agent loop and executes its built-in tools (shell, file edits, search) against the sandbox workspace; assistant text/thinking stream as token-level deltas and tool activity as resolved tool-call events. Sessions are resumable via `modelOptions.sessionId` (surfaced through a `gemini-cli.session-id` custom event, with graceful fallback to transcript replay), and ACP permission requests are answered by a configurable never-hanging policy (`default` / `acceptEdits` / `bypassPermissions` or a custom handler), and an action the policy would reject with no client decision yet surfaces an `approval-requested` event so the client can approve and re-run to grant it (interactive approvals). Headless auth is selectable up front via `authMethodId`. Requires the `gemini` CLI in the sandbox. chat()-provided server tools are bridged into the agent via the host MCP tool-proxy.
diff --git a/.changeset/ai-opencode-initial.md b/.changeset/ai-opencode-initial.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-opencode': minor
+---
+
+New `@tanstack/ai-opencode` package: an OpenCode **harness adapter that runs inside a sandbox**. It declares `requires: [SandboxCapability]`, spawns `opencode serve` inside the sandbox provided by `withSandbox(...)`, exposes its port, and connects the `@opencode-ai/sdk` HTTP client to it via `baseUrl`. OpenCode owns the agent loop and executes its built-in tools (shell, file edits, search) against the sandbox workspace; assistant text/thinking stream as token-level deltas and tool activity as resolved tool-call events. Sessions are resumable, and OpenCode permission requests are answered by a configurable `permissionMode` (`default` / `acceptEdits` / `bypassPermissions` or a custom handler), and a request the policy would reject with no client decision yet surfaces an `approval-requested` event so the client can approve and re-run to grant it (interactive approvals). Requires the `opencode` CLI in the sandbox (Docker: publish the server port via `publishPorts`). chat()-provided server tools are bridged into the agent via the host MCP tool-proxy.
diff --git a/.changeset/ai-sandbox-cloudflare.md b/.changeset/ai-sandbox-cloudflare.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-sandbox-cloudflare': minor
+---
+
+New `@tanstack/ai-sandbox-cloudflare` package: a Cloudflare Containers sandbox provider (`cloudflareSandbox`) built on `@cloudflare/sandbox`, for running harness adapters at the edge inside a Worker. Implements the uniform `SandboxHandle` (exec, base64-backed fs, git, `exposePort` preview URLs, env) over the Cloudflare Sandbox Durable Object. The container disk is ephemeral and snapshots are not yet GA, so `withSandbox` re-bootstraps under the same identity across cold starts (`durableFilesystem`/`snapshots` are reported false). Background processes don't expose stdin on Cloudflare, so stdin-fed harnesses (e.g. Claude Code) need a stdin-capable provider; `exec` works fully.
diff --git a/.changeset/persistence-layer.md b/.changeset/persistence-layer.md
@@ -0,0 +1,31 @@
+---
+'@tanstack/ai': minor
+'@tanstack/ai-sandbox': patch
+'@tanstack/ai-client': minor
+'@tanstack/ai-claude-code': patch
+'@tanstack/ai-codex': patch
+'@tanstack/ai-gemini-cli': patch
+'@tanstack/ai-opencode': patch
+'@tanstack/ai-persistence': minor
+'@tanstack/ai-persistence-sql': minor
+'@tanstack/ai-persistence-sqlite': minor
+'@tanstack/ai-persistence-postgres': minor
+'@tanstack/ai-persistence-cloudflare': minor
+'@tanstack/ai-persistence-drizzle': minor
+'@tanstack/ai-persistence-prisma': minor
+'@tanstack/ai-sandbox-persistence': minor
+---
+
+Persistence + resumable runs as composable `chat()` middleware.
+
+`withPersistence(...)` makes any run durable: it loads/saves thread message history (server-authoritative), creates/updates run records, persists every AG-UI `StreamChunk` to an append-only event log, and persists usage. It is fully **optional** — a `chat()` with no persistence middleware is byte-for-byte unchanged, and it works for both non-sandbox and sandbox (agent-mode) runs.
+
+**Resume.** Each persisted chunk carries an in-band, opaque `cursor` (a monotonic per-run sequence). A client that disconnects mid-run reconnects with the run's `runId` + last `cursor`; `chat({ cursor })` replays the persisted event tail after that cursor, then — for harness adapters that re-attach to their still-running in-sandbox process — continues live. The headless `ChatClient` tracks the cursor and exposes `resume()` / `getResumeState()` / `maybeAutoResume()` with an `autoResume` opt-out.
+
+**Event model.** The persisted log is the AG-UI `StreamChunk` stream itself (no parallel event type); agent activity (file changes, process output, approvals, artifacts, sandbox lifecycle) rides on well-known `CUSTOM` events catalogued in `@tanstack/ai`.
+
+**Backends (shared SQL core + thin adapters).** One SQL implementation behind a minimal `SqlDriver` (`@tanstack/ai-persistence-sql`), with backends for SQLite (`-sqlite`, node:sqlite/better-sqlite3), Postgres (`-postgres`, pg), Cloudflare D1 (`-cloudflare`), and bring-your-own Drizzle (`-drizzle`) and Prisma (`-prisma`). Raw drivers auto-migrate (versioned, opt-out); ORMs own their schema. `memoryPersistence()` ships in core for tests/examples.
+
+**Agent mode.** `@tanstack/ai-sandbox-persistence` bridges a durable SQL-backed `SandboxStore` and the durable `LockStore` into `withSandbox`, so sandbox resume and ensure-locking survive across processes. The shared `locks` capability now lives in `@tanstack/ai` (one token across the sandbox and persistence layers); `@tanstack/ai-sandbox` re-exports it for back-compat.
+
+Approvals are persisted and a durable approval controller feeds decisions back into the existing deny-and-replay flow. Cloudflare is compile-verified (Workers runtime), Postgres runtime-verification is via Docker, and live harness re-attach is verified with the real CLIs; everything else is unit/integration-tested. The Playwright E2E suite is a follow-up.
diff --git a/.changeset/sandbox-hooks-redesign.md b/.changeset/sandbox-hooks-redesign.md
@@ -0,0 +1,23 @@
+---
+'@tanstack/ai': minor
+'@tanstack/ai-sandbox': minor
+'@tanstack/ai-sandbox-local-process': minor
+---
+
+Declarative sandbox file-event hooks: observe file create / change / delete
+inside a sandbox and have them fire automatically during a chat run.
+
+- `@tanstack/ai`: chat middleware gains an optional `sandbox` hook group
+  (`onFile`/`onFileCreate`/`onFileChange`/`onFileDelete`), a `SandboxFileEvent`
+  type, and a `sandbox` debug-logging category. The engine auto-emits a
+  `CUSTOM` `sandbox.file` event per change (client reads it from `parts`).
+- `@tanstack/ai-sandbox`: `defineSandbox({ hooks, fileEvents })` declares
+  file + lifecycle hooks (`onFile*`/`onReady`/`onError`/`onDestroy`) that fire
+  automatically while the sandbox runs in a chat — `withSandbox` owns the
+  watcher. The watcher is provider-agnostic: a native `fs.watch` fast-path when
+  the provider advertises it, otherwise a portable `find -printf` mtime
+  snapshot-diff poll (no extra deps; `.git`/`node_modules` ignored by default).
+  `watchWorkspace()` / `diffSnapshots` remain as low-level building blocks.
+- `@tanstack/ai-sandbox-local-process`: implements the optional `fs.watch` seam
+  via Node's recursive `fs.watch` (Windows/macOS); Linux falls back to the core
+  exec-poll automatically.
diff --git a/.changeset/sandbox-layer.md b/.changeset/sandbox-layer.md
@@ -0,0 +1,13 @@
+---
+'@tanstack/ai-sandbox': minor
+'@tanstack/ai-sandbox-local-process': minor
+'@tanstack/ai-sandbox-docker': minor
+'@tanstack/ai': minor
+---
+
+New provider-agnostic sandbox layer so harness adapters can run **inside** isolated sandboxes.
+
+- **`@tanstack/ai-sandbox`** — `defineSandbox()` (lazy controller + resume→restoreSnapshot→create+bootstrap ensure algorithm), `withSandbox()` middleware, `defineWorkspace()` (git/local source, package-manager detection, setup, skills, secrets), `defineSandboxPolicy()`, the `SandboxProvider`/`SandboxHandle`/`SandboxCapabilities` contracts, capability tokens (`SandboxCapability` plus the optional `SandboxStore`/`Locks` persistence seams with in-memory defaults), `bootstrapWorkspace`, `createExecBackedGit`, `spawnNdjson` (run an agent CLI in a sandbox and stream its NDJSON stdout), the host MCP tool-proxy bridge (`startHostToolBridge` — exposes `chat()` server tools to the in-sandbox agent, with an optional permission-prompt tool), and the shared interactive-approval primitives (`resolveApproval`, `approvalId`, `buildApprovalRequestedEvent`) harness adapters use to enforce a policy and surface `approval-requested` events for client-in-the-loop approvals.
+- **`@tanstack/ai-sandbox-local-process`** — `localProcessSandbox()`: runs the agent on the host through the uniform `SandboxHandle` (no isolation; the fast dev loop).
+- **`@tanstack/ai-sandbox-docker`** — `dockerSandbox()`: runs the agent inside an isolated Docker container (dockerode), with commit-based snapshots, fork, and resume-by-id.
+- **`@tanstack/ai`** — `TextOptions.capabilities` exposes the middleware capability context to adapters so harness adapters that declare `requires: [...]` can read provided capabilities from `chatStream`; `TextOptions.approvals` threads client approval decisions through to adapters for the interactive-approval (deny + `approval-requested` + re-run) flow; `DefinedChatMiddleware` and `AnyChatMiddleware` are now exported for portable middleware authoring.
diff --git a/.gitignore b/.gitignore
@@ -78,3 +78,7 @@ solo.yml
 # Agent scratch output (gap-analysis reports, triage notes — generated locally)
 .agent/gap-analysis/
 .agent/triage/
+
+/OpenCode.md
+.agentsroom/
+.opencode/