feat(ai): sandboxed harness adapters (provider-agnostic sandbox layer) #774
Conversation
…xample New @tanstack/ai-claude-code package that runs Claude Code (via @anthropic-ai/claude-agent-sdk) as a TanStack AI chat backend. Unlike HTTP provider adapters, this is a harness adapter: Claude Code owns the agent loop and executes its built-in tools (bash, file edits, search) server-side. - Stream translator maps Agent SDK messages to AG-UI events; harness tool activity arrives as already-resolved TOOL_CALL_*/TOOL_CALL_RESULT pairs and runs always finish with stop/length (never tool_calls), so the engine never re-executes harness tools. Every started tool call is guaranteed a result (synthesized on abort) to keep the engine's pending-call scan safe. - TanStack toolDefinition() server tools are bridged into the harness as an in-process MCP server (raw JSON Schema passthrough, no zod round-trip). Client-side/approval tools fail fast — documented v1 limitation. - Stateful sessions: session id surfaced via a claude-code.session-id CUSTOM event; resume via modelOptions.sessionId (+ forkSession). - Structured output uses the SDK's native outputFormat json_schema. - settingSources defaults to ['project'] so servers don't inherit user-level ~/.claude config from the host machine. - E2E: excluded from the aimock matrix (subprocess can't carry X-Test-Id isolation); covered by 44 unit tests plus a gated live smoke spec (CLAUDE_CODE_E2E=1). Also adds examples/ts-react-coding-agent: a TanStack Start app demoing session resume, the harness tool timeline, read-only/edit permission modes, tool bridging, and a sandboxed scratch workspace — with the agent registry structured so future Codex/Gemini CLI harness adapters can slot in. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ters Add two new coding-agent harness adapters alongside Claude Code: - @tanstack/ai-codex drives OpenAI Codex via @openai/codex-sdk with local tool execution, resumable sessions (modelOptions.sessionId), structured output, and a localhost MCP bridge for TanStack server tools. - @tanstack/ai-gemini-cli drives `gemini --acp` over the Agent Client Protocol with token-level streaming, resumable sessions, a configurable permission policy, and headless ACP auth method selection (authMethodId) so runs never stall on an interactive auth picker. Wire both into the ts-react-coding-agent example: the agent dropdown keeps every harness selectable, and a server function (createServerFn) reports which agents are actually configured at runtime so the UI can surface a setup dialog for unconfigured ones. Includes adapter docs and changesets. Co-authored-by: Cursor <cursoragent@cursor.com>
Add the @tanstack/ai-opencode package, an OpenCode harness adapter that drives OpenCode (via @opencode-ai/sdk) as a TanStack AI chat backend with local tool execution, token-level streaming, stateful sessions, and TanStack tool bridging over a localhost MCP server. Wires the adapter into the ts-react-coding-agent example, adds the OpenCode adapter docs page, and anchors the OpenCode.md gitignore entry so it no longer shadows the docs page on case-insensitive filesystems. Co-authored-by: Cursor <cursoragent@cursor.com>
# Conflicts: # pnpm-lock.yaml
…e, withSandbox, workspace, policy - @tanstack/ai-sandbox: provider-agnostic SandboxHandle/SandboxProvider/SandboxCapabilities contracts - capability tokens (SandboxCapability + optional SandboxStore/Locks), in-memory store/lock defaults - defineSandbox lazy controller + ensure state machine (resume->restoreSnapshot->create+bootstrap) with capability-aware degradation - withSandbox middleware (setup provides handle; onFinish/onError snapshot+destroy) - defineWorkspace (git/local/none + skills + secrets), provider-agnostic bootstrapWorkspace - defineSandboxPolicy + evaluateCommand (glob, deny>ask>allow), compound sandbox key (secrets excluded) - export DefinedChatMiddleware/AnyChatMiddleware from @tanstack/ai for portable middleware authoring - 22 unit tests (ensure/policy/key/store); types + lint clean Refs sandbox proposal (Phase A).
…git helper - @tanstack/ai-sandbox-local-process: SandboxHandle over host fs/child_process (no isolation, dev loop) - virtual /workspace root mapped to a real host dir with path containment - exec/spawn (duplex stdin, streamed stdout), localhost port channel, env, fork via dir copy, durable fs resume-by-dir - core: createExecBackedGit helper (shared by providers without native git); bootstrap clones into the handle's own root - 10 unit tests (fs/exec/spawn/lifecycle/fork/bootstrap/ensure); types + lint clean
…runner - @tanstack/ai: TextOptions.capabilities carries the middleware capability context so harness adapters can read provided capabilities (getSandbox(options.capabilities)) from chatStream; populated by the engine - @tanstack/ai-sandbox: spawnNdjson/toLines — spawn an agent CLI in a sandbox and stream parsed NDJSON stdout (the reusable harness-execution primitive) - tests: toLines buffering + spawnNdjson parsing (core), real spawn+NDJSON via local-process (11) — 25 core tests; types + lint clean
|
Important Review skippedToo many files! This PR contains 322 files, which is 172 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (322)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview14 package(s) bumped directly, 30 bumped as dependents. 🟥 Major bumps
🟨 Minor bumps
🟩 Patch bumps
|
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
View your CI Pipeline Execution ↗ for commit 70e8962
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-acp
@tanstack/ai-angular
@tanstack/ai-anthropic
@tanstack/ai-bedrock
@tanstack/ai-claude-code
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-codex
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-grok-build
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-mistral
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-opencode
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-sandbox
@tanstack/ai-sandbox-cloudflare
@tanstack/ai-sandbox-daytona
@tanstack/ai-sandbox-docker
@tanstack/ai-sandbox-local-process
@tanstack/ai-sandbox-vercel
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
…ential leakage Security review (PR #774): - argument injection: insert '--' end-of-options separators before positionals (clone url/target, add paths) and reject url/ref/dir/path values beginning with '-' (flag-smuggling guard) - secrets in argv: stop embedding the auth token in the clone URL (leaked via ps/logs); use a one-shot credential.helper that reads the token from the child ENV, single-quoted so the outer shell never expands it - 4 unit tests pinning: token absent from argv + present in env, '--' separators, leading-dash rejection, quote escaping
- @tanstack/ai-sandbox-docker: SandboxHandle over a Docker container - create/resume-by-id/restoreSnapshot(commit image)/destroy; durable fs across stop/start - exec + duplex spawn via dockerode exec + stream demux; fs over base64 piping (binary-safe, no tar dep) - commit-based snapshot + fork; host.docker.internal gateway for host MCP reachability; publishPorts -> ports.connect - exec-backed git reused from core - 3 integration tests (gated on a reachable daemon) — verified green against a real daemon: exec, fs+binary round-trip, snapshot, resume, spawn streaming, ensure+bootstrap - pnpm-workspace: declare dockerode's optional native deps (cpu-features, ssh2) as not-built (JS fallback, local socket)
- claudeCodeText now declares requires:[SandboxCapability] and spawns the claude CLI INSIDE the sandbox via sandbox.process (claude -p --output-format stream-json), reusing translateSdkStream for the stdout NDJSON - prompt fed via stdin (not argv); session id surfaced as before; emits a file.changed CUSTOM event with the git diff after the run - permission-mode/allowed/disallowed/add-dir/max-turns/system-prompt mapped to CLI flags; default permission-mode bypassPermissions (sandbox is isolated) - drop @anthropic-ai/claude-agent-sdk + @modelcontextprotocol/sdk deps; remove the in-process tool bridge (chat()-tools MCP proxy deferred — adapter rejects tools for now); provider-options self-contained - spawnNdjson gains an option to feed stdin - deterministic test via a fake claude CLI in a real local-process sandbox (24 tests); types + lint clean
Runnable demo (examples/sandbox-coding-agent) that runs Claude Code inside a sandbox to fix a bug end-to-end via chat() + withSandbox: - bootstraps a tiny git repo with a deliberate bug, asks the agent to fix it, streams output + prints the git diff - Docker provider by default (installs the claude CLI in setup); SANDBOX=local runs on the host process - README with prerequisites + run instructions for manual e2e verification
…lag mapping; changesets - SandboxPolicyCapability: withSandbox provides the definition policy (conditionally); harness adapters read it via getOptional - claude-code maps defineSandboxPolicy (default decision + fileWrite/network caps + tool-name command rules) onto --permission-mode/--allowedTools/--disallowedTools (best-effort; fine-grained command globs await the MCP permission-prompt tool) - changesets for the sandbox layer + updated claude-code changeset for the in-sandbox behavior - policy-map unit tests (5)
- docs/sandbox/overview.md: mental model, providers, defineWorkspace/defineSandboxPolicy, lifecycle/resume, events, the runnable example (no as-casts; latest model id) - docs/config.json: new Sandboxes section (addedAt 2026-06-16) - packages/ai-sandbox/skills/ai-sandbox: agent skill covering the sandbox APIs + critical rules - ship skills in the package files - test:docs green
…n-sandbox agent - startHostToolBridge: host-side Streamable-HTTP MCP server exposing chat() server tools; the in-sandbox claude calls mcp__tanstack__<tool>, proxied back to the host where execute() runs (closures/DB/secrets). Per-run bearer token; binds for host.docker.internal reachability from Docker - adapter wires --mcp-config when tools are present, picks localhost vs host.docker.internal by provider, and tears the bridge down after the run; tools no longer rejected - 3 host-side tests via the MCP SDK client (list/call/error/auth) — verified green without needing claude - docs + skill updated to describe the tool-proxy
- @tanstack/ai-sandbox-cloudflare: cloudflareSandbox() on @cloudflare/sandbox (edge, inside a Worker) - uniform SandboxHandle: exec, base64-backed fs, exec-backed git, exposePort preview URLs (previewHostname), setEnvVars; spawn via startProcess+onOutput queue - ephemeral disk + no GA snapshots -> durableFilesystem/snapshots false (withSandbox re-bootstraps across cold starts); background processes have no stdin (documented; stdin-fed harnesses need local-process/docker) - compiles against the real @cloudflare/sandbox types; 7 deterministic handle tests against a mock Sandbox (fs round-trip, exec, spawn queue, stdin limitation, port). Runtime verification pending a Workers runtime - align @cloudflare/workers-types version with the workspace (sherif)
- codexText declares requires:[SandboxCapability]; spawns 'codex exec --experimental-json' inside the sandbox (mirroring @openai/codex-sdk's own CLI invocation), prompt via stdin, JSONL thread events → existing translateThreadEvents - sandbox mode / approval policy / reasoning effort / add-dir / skip-git-repo-check / config mapped to codex CLI flags; resume via 'resume <id>' - drop @openai/codex-sdk + @modelcontextprotocol/sdk + the in-process tool bridge; provider-options self-contained; chat()-tools bridging deferred (rejects tools) - deterministic fake-codex-CLI test in a real local-process sandbox (27 tests); types/lint/knip/sherif clean
The Grok CLI no longer accepts --cwd on `grok agent`; working directory is conveyed via ACP newSession/loadSession instead. Update command builders, tests, and the ai-acp README example.
|
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Drop the harness package, its changeset, example integrations, Dockerfile install step, and docs. Sandbox examples now support claude-code, codex, and grok only.
Remove sandbox-coding-agent, sandbox-issue-triage, and ts-react-coding-agent; sandbox-cloudflare and sandbox-web cover the same ground. Repoint the docs and ai-sandbox README references to the surviving examples and prune the lockfile.
Add acpCompatible / acpCompatibleText — the harness equivalent of openaiCompatible. Build a chat() text adapter for any ACP-compliant agent CLI and plug it into a sandbox without a dedicated package: configure command (stdio) or openTransport (WebSocket/custom) once, select a model per call. It handles sandbox resolution, tool->MCP bridging, session resume, permission modes (headless/interactive), abort, and AG-UI translation. Also exports the shared buildAcpPrompt helper. Includes an end-to-end stdio test against a real ACP agent, plus README and docs pages.
- initialize handshake now sends clientInfo and validates the negotiated protocol version (closes the connection if the agent requires a newer one). - translator surfaces non-text agent content (image/audio/resource blocks) as a CUSTOM event via a new optional 'contentEvent' label instead of dropping it; acpCompatible enables it as '<name>.message-content'. grok-build is unchanged (it doesn't set the label). - tool results preserve non-text content (diffs, terminal, images) instead of collapsing to a status stub. - document protocol coverage (covered / surfaced-as-custom / not-implemented) in the README and docs so we don't overclaim full-spec coverage.
Parity with openaiCompatible: declare a 'models' tuple for a type-safe model
union (harness('unknown') is now a compile error; omit to accept any string),
and a 'modelOptions' type-only brand ({} as { ... }) for the per-call options
accepted via chat({ modelOptions }). Declared options are merged with the base
ACP options and surfaced on ctx.modelOptions in command/openTransport so they
can be turned into CLI flags.
Add a dedicated sandbox Harnesses page listing the built-in harness adapters (Grok Build, Claude Code, Codex, OpenCode) and acpCompatible for any ACP agent, linking to the official ACP agents list + registry. Cross-link it from the overview's harness axis and the Providers page.
Project withSandbox workspace skills into acpCompatible harnesses: MCP skills go over ACP's native newSession mcpServers (resolving secret/bearer headers), and gitSkills are linked into a new harness-declared skillsDir flag (e.g. .pi/skills, like Claude Code's .claude/skills). fileSkill/instructions/secrets are handled by bootstrap. agentSkill/plugins are warned-and-skipped. Exposes workspaceMcpServers and projectAcpWorkspace. Add an end-to-end sandbox-provisioning test suite driving a fake ACP agent in a real local-process sandbox: asserts secrets reach the agent env, fileSkill + instructions land as files, setup runs, permission modes behave (bypass allows / default rejects / interactive emits an approval event), workspace MCP skills reach newSession, and gitSkills link into skillsDir. Fix a cross-provider path bug: the projector now runs shell copies relative to the workspace root (the exec cwd) instead of the virtual /workspace absolute path, which only fs.* remaps.
# Conflicts: # pnpm-lock.yaml
The package was removed from the workspace, so the changeset's reference to it
failed Changeset Preview ("package ... not in the workspace").
- Drop the @daytona/* minimumReleaseAge exclusions: 0.191.0 published 2026-06-25, well past the 1-day gate, so they're no-ops (the lockfile pins the matched set). - Remove the ad-hoc repro-local-grok.mjs / sandbox-live-smoke.mjs dev helpers (not part of the shipped surface) and the ai-sandbox README section that referenced the smoke script.
# Conflicts: # docs/config.json # examples/ts-react-chat/package.json # pnpm-lock.yaml # testing/e2e/package.json
Two axes: harness (what runs) × provider (where)
Providers — all behind the same
SandboxHandle/SandboxProvidercontract:@tanstack/ai-sandbox-local-process@tanstack/ai-sandbox-docker@tanstack/ai-sandbox-daytona@tanstack/ai-sandbox-vercel@tanstack/ai-sandbox-cloudflareHarnesses — each runs in-sandbox and declares
requires: [SandboxCapability]:@tanstack/ai-claude-code—claude -p --output-format stream-jsonover stdin; MCP tool-proxy bridgeschat()tools; policy → permission-prompt + interactive approvals.@tanstack/ai-codex—codex exec --experimental-json; MCP tool-proxy; policy → sandbox/approval/network knobs.@tanstack/ai-opencode—opencode servein-sandbox over the SDK; permission modes + interactive approvals; MCP tool-proxy.@tanstack/ai-grok-build—grokover ACP (auto stdio/WebSocket;grok agent serve), with a legacystreaming-jsonpath.@tanstack/ai-acp— the ACP layer +acpCompatibleShared Agent Client Protocol plumbing (transport, session, permissions, AG-UI translation) plus
acpCompatible— the harness equivalent ofopenaiCompatible. Plug any ACP-compliant agent CLI (pi,gemini --acp, dozens more) into a sandbox without a dedicated package:openaiCompatible):modelsunion +modelOptionsbrand;command(stdio) oropenTransport(WebSocket/custom).mcpServers(secrets resolved);gitSkills →skillsDir;fileSkill/instructions/secretsvia bootstrap.headless/interactive); session resume; abort; non-text agent/tool content surfaced asCUSTOMevents.clientInfo+ protocol-version negotiation; documented coverage (covered / surfaced-as-CUSTOM / not-implemented) so we don't overclaim.Core + provisioning
@tanstack/ai-sandbox—defineSandbox()(lazy controller + resume→restoreSnapshot→create+bootstrap state machine),withSandbox(),defineWorkspace(),bootstrapWorkspace,defineSandboxPolicy()+evaluateCommand, capability tokens (SandboxCapability+ optionalSandboxStore/Locks/SandboxPolicy/Projection),createExecBackedGit,spawnNdjson, the host MCP tool-proxy bridge, and the shared interactive-approval primitives.createSecrets— type-safe secret refs injected into the agent env at create/resume, never persisted to snapshots/store/event log.fileSkill/agentSkill/mcpSkill/gitSkill,instructions→AGENTS.md,scripts,plugins; each harness projects into its native format.onFile/onFileCreate/onFileChange/onFileDeletechat middleware +SandboxFileEvent.@tanstack/ai—TextOptions.capabilities, thesandboxmiddleware hook group, middleware type exports.Examples & docs
examples/sandbox-web— pick harness × provider per run from the UI; agent scaffolds an app, runs the dev server, returns a live preview URL.examples/sandbox-cloudflare— the same agent at the edge.docs/sandbox/*section (overview, quick start, providers, harnesses, workspace, provisioning, tools, policy, lifecycle, events, observability, cloudflare) + adapter pages (claude-code, codex, opencode, grok-build, acp-compatible). All code samples type-check under kiira.Verification
@tanstack/ai-acp): a fake ACP agent in a real sandbox proves secrets reach the agent env,fileSkill/instructionsland as files,setupruns, permission modes behave, workspace MCP skills reachnewSession, andgitSkills link intoskillsDir.@tanstack/aisuite still passes; types / eslint / build / publint / kiira all green; changesets for every package.Remaining (documented)
defineSandboxPolicy→ permission modes +approval-requestedevents.