TanStack · tombeckenham · Jun 26, 2026 · Jun 26, 2026 · coderabbitai · Jun 26, 2026
diff --git a/.changeset/anthropic-max-tokens-default.md b/.changeset/anthropic-max-tokens-default.md
@@ -0,0 +1,23 @@
+---
+'@tanstack/ai-anthropic': patch
+---
+
+Default Anthropic `max_tokens` to the selected model's real output ceiling
+(`max_output_tokens` from model metadata — e.g. 64K for Sonnet, 128K for Opus)
+when the caller doesn't pass one, instead of a hard-coded `1024` that silently
+truncated long responses with `stop_reason: "max_tokens"` (#849). Unknown
+models fall back to a safe constant. `max_tokens` is a ceiling, not a
+reservation, so this costs nothing unless the model genuinely produces more.
+
+The adapter also now logs a warning when a response is truncated while using the
+defaulted (caller-unspecified) cap, so the truncation isn't silently attributed
+to the model "doing nothing". Callers that set `modelOptions.max_tokens`
+explicitly are unaffected.
+
+The non-streaming structured-output path (`structuredOutput()`) clamps this
+default to the Anthropic SDK's non-streaming-safe limit (~21K tokens). The SDK
+refuses a non-streaming request whose `max_tokens` could exceed its 10-minute
+timeout, so without the clamp the full-ceiling default would make every
+`chat({ outputSchema })` call on a fallback-path model throw "Streaming is
+required for operations that may take longer than 10 minutes". The streaming
+chat path keeps the model's full ceiling.
diff --git a/docs/adapters/anthropic.md b/docs/adapters/anthropic.md
@@ -136,6 +136,12 @@ const stream = chat({
 
 > If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options).
 
+#### `max_tokens` default
+
+Anthropic's Messages API _requires_ `max_tokens` on every request, so the adapter always sends a value. When you don't set `modelOptions.max_tokens`, it defaults to the selected model's full output ceiling (`max_output_tokens` from the model metadata — e.g. 64K for Sonnet, 128K for Opus), falling back to a safe constant for unrecognized models. `max_tokens` is a ceiling, not a reservation — billing is on tokens actually generated — so this default costs nothing extra and avoids the silent mid-response truncation (`stop_reason: "max_tokens"`) that a low default would cause. Set `max_tokens` explicitly only when you want to _cap_ output below the model ceiling. If a response is truncated while using the default cap, the adapter logs a warning (visible with [debug logging](../advanced/debug-logging) enabled).
+
-#### `max_tokens` default
-
-Anthropic's Messages API _requires_ `max_tokens` on every request, so the adapter always sends a value. When you don't set `modelOptions.max_tokens`, it defaults to the selected model's full output ceiling (`max_output_tokens` from the model metadata — e.g. 64K for Sonnet, 128K for Opus), falling back to a safe constant for unrecognized models. `max_tokens` is a ceiling, not a reservation — billing is on tokens actually generated — so this default costs nothing extra and avoids the silent mid-response truncation (`stop_reason: "max_tokens"`) that a low default would cause. Set `max_tokens` explicitly only when you want to _cap_ output below the model ceiling. If a response is truncated while using the default cap, the adapter logs a warning (visible with [debug logging](../advanced/debug-logging) enabled).
+### `max_tokens` default
-#### `max_tokens` default
-
-Anthropic's Messages API _requires_ `max_tokens` on every request, so the adapter always sends a value. When you don't set `modelOptions.max_tokens`, it defaults to the selected model's full output ceiling (`max_output_tokens` from the model metadata — e.g. 64K for Sonnet, 128K for Opus), falling back to a safe constant for unrecognized models. `max_tokens` is a ceiling, not a reservation — billing is on tokens actually generated — so this default costs nothing extra and avoids the silent mid-response truncation (`stop_reason: "max_tokens"`) that a low default would cause. Set `max_tokens` explicitly only when you want to _cap_ output below the model ceiling. If a response is truncated while using the default cap, the adapter logs a warning (visible with [debug logging](../advanced/debug-logging) enabled).
+### `max_tokens` default
+One exception: structured output (`chat({ outputSchema })`) on models that use the non-streaming finalization path clamps this default to ~21K tokens. The Anthropic SDK rejects a non-streaming request whose `max_tokens` could exceed its 10-minute timeout, so the full ceiling can't be used there. Streaming chat is unaffected. To raise the structured-output ceiling toward a model's true max, stream the response.
+
 ### Thinking (Extended Thinking)
 
 Enable extended thinking with a token budget. This allows Claude to show its reasoning process, which is streamed as `thinking` chunks:

diff --git a/docs/config.json b/docs/config.json
@@ -431,7 +431,8 @@
         {
           "label": "Anthropic",
           "to": "adapters/anthropic",
-          "addedAt": "2026-04-15"
+          "addedAt": "2026-04-15",
+          "updatedAt": "2026-06-26"
         },
         {
           "label": "Google Gemini",

diff --git a/packages/ai-anthropic/src/adapters/text.ts b/packages/ai-anthropic/src/adapters/text.ts
@@ -10,7 +10,10 @@ import {
   generateId,
   getAnthropicApiKeyFromEnv,
 } from '../utils'
-import { ANTHROPIC_COMBINED_TOOLS_AND_SCHEMA_MODELS } from '../model-meta'
+import {
+  ANTHROPIC_COMBINED_TOOLS_AND_SCHEMA_MODELS,
+  getAnthropicDefaultMaxTokens,
+} from '../model-meta'
 import type {
   ANTHROPIC_MODELS,
   AnthropicChatModelProviderOptionsByName,
@@ -263,7 +266,12 @@ export class AnthropicTextAdapter<
     const { chatOptions, outputSchema } = options
     const { logger } = chatOptions
 
-    const requestParams = this.mapCommonOptionsToAnthropic(chatOptions)
+    // `structuredOutput()` issues a non-streaming `messages.create({ stream:
+    // false })` below, so the defaulted `max_tokens` must stay under the SDK's
+    // non-streaming 10-minute guard (issue #849) — pass `stream: false`.
+    const requestParams = this.mapCommonOptionsToAnthropic(chatOptions, {
+      stream: false,
+    })
 
     // Create a tool that will capture the structured output
     // Anthropic's SDK requires input_schema with type: 'object' literal
@@ -352,6 +360,7 @@ export class AnthropicTextAdapter<
 
   private mapCommonOptionsToAnthropic(
     options: TextOptions<AnthropicTextProviderOptions>,
+    { stream = true }: { stream?: boolean } = {},
   ) {
     const modelOptions = options.modelOptions
 
@@ -420,7 +429,18 @@ export class AnthropicTextAdapter<
       validProviderOptions.thinking?.type === 'enabled'
         ? validProviderOptions.thinking.budget_tokens
         : undefined
-    const defaultMaxTokens = modelOptions?.max_tokens ?? 1024
+    // Anthropic's Messages API *requires* `max_tokens`, so we must always send a
+    // value. When the caller doesn't specify one, default to the resolved
+    // model's real output ceiling (from model-meta) rather than a low constant
+    // that silently truncates long responses with `stop_reason: "max_tokens"`
+    // (issue #849). `max_tokens` is a ceiling, not a reservation — billing is on
+    // tokens actually generated, so a higher default costs nothing extra.
+    // For non-streaming requests (the `structuredOutput()` path) the default is
+    // clamped to the SDK's non-streaming-safe limit so it doesn't trip the
+    // "streaming required" 10-minute guard — see getAnthropicDefaultMaxTokens.
+    const defaultMaxTokens =
+      modelOptions?.max_tokens ??
+      getAnthropicDefaultMaxTokens(this.model, { stream })
     const maxTokens =
       thinkingBudget && thinkingBudget >= defaultMaxTokens
         ? thinkingBudget + 1
@@ -1181,6 +1201,22 @@ export class AnthropicTextAdapter<
                 break
               }
               case 'max_tokens': {
+                // Surface a warning when the truncating cap was the
+                // adapter-supplied default (caller didn't pass `max_tokens`), so
+                // the truncation isn't silently attributed to the model "doing
+                // nothing" (issue #849). When the caller set `max_tokens`
+                // themselves, hitting it is their own deliberate ceiling.
+                if (options.modelOptions?.max_tokens == null) {
+                  const defaultedMaxTokens = getAnthropicDefaultMaxTokens(model)
+                  logger.warn(
+                    `anthropic response truncated at the default max_tokens (${defaultedMaxTokens}) for model=${model}; pass maxTokens (or modelOptions.max_tokens) to raise the output ceiling`,
+                    {
+                      source: 'anthropic.processAnthropicStream',
+                      model,
+                      defaultedMaxTokens,
+                    },
+                  )
+                }
                 yield {
                   type: EventType.RUN_ERROR,
                   model,

diff --git a/packages/ai-anthropic/src/model-meta.ts b/packages/ai-anthropic/src/model-meta.ts
@@ -733,6 +733,87 @@ export const ANTHROPIC_MODELS = [
   CLAUDE_OPUS_4_8_FAST.id,
 ] as const
 
+/**
+ * Fallback `max_tokens` ceiling for a model whose metadata carries no
+ * `max_output_tokens` (e.g. an unrecognized model id). Anthropic's Messages
+ * API *requires* `max_tokens`, so the adapter must always send a value. 64K is
+ * the output ceiling of the current mainstream Claude tier (Sonnet/Haiku 4.5),
+ * so it's a sane default for an unknown — almost certainly modern — model and
+ * avoids silently truncating long generations (issue #849). Recognized models
+ * use their exact `max_output_tokens` from {@link ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS}
+ * (e.g. 128K for Opus), so this fallback only ever applies to ids not in the
+ * map.
+ */
+export const ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS = 64_000
+
+/**
+ * Runtime lookup of each model's maximum output-token ceiling, keyed by model
+ * id. Lets the text adapter default the required `max_tokens` request field to
+ * the model's real ceiling when the caller doesn't specify one, rather than a
+ * low constant that truncates responses mid-stream (issue #849).
+ *
+ * Kept in sync with {@link ANTHROPIC_MODELS} by `scripts/sync-provider-models.ts`
+ * — when that script adds a model it also inserts the model's `max_output_tokens`
+ * here, so a freshly-synced model resolves to its real ceiling rather than the
+ * fallback above.
+ */
+const ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS: Record<string, number> = {
+  [CLAUDE_OPUS_4_6.id]: CLAUDE_OPUS_4_6.max_output_tokens,
+  [CLAUDE_OPUS_4_5.id]: CLAUDE_OPUS_4_5.max_output_tokens,
+  [CLAUDE_SONNET_4_6.id]: CLAUDE_SONNET_4_6.max_output_tokens,
+  [CLAUDE_SONNET_4_5.id]: CLAUDE_SONNET_4_5.max_output_tokens,
+  [CLAUDE_HAIKU_4_5.id]: CLAUDE_HAIKU_4_5.max_output_tokens,
+  [CLAUDE_OPUS_4_1.id]: CLAUDE_OPUS_4_1.max_output_tokens,
+  [CLAUDE_SONNET_4.id]: CLAUDE_SONNET_4.max_output_tokens,
+  [CLAUDE_SONNET_3_7.id]: CLAUDE_SONNET_3_7.max_output_tokens,
+  [CLAUDE_OPUS_4.id]: CLAUDE_OPUS_4.max_output_tokens,
+  [CLAUDE_HAIKU_3_5.id]: CLAUDE_HAIKU_3_5.max_output_tokens,
+  [CLAUDE_HAIKU_3.id]: CLAUDE_HAIKU_3.max_output_tokens,
+  [CLAUDE_OPUS_4_6_FAST.id]: CLAUDE_OPUS_4_6_FAST.max_output_tokens,
+  [CLAUDE_OPUS_4_7.id]: CLAUDE_OPUS_4_7.max_output_tokens,
+  [CLAUDE_OPUS_4_7_FAST.id]: CLAUDE_OPUS_4_7_FAST.max_output_tokens,
+  [CLAUDE_OPUS_4_8.id]: CLAUDE_OPUS_4_8.max_output_tokens,
+  [CLAUDE_OPUS_4_8_FAST.id]: CLAUDE_OPUS_4_8_FAST.max_output_tokens,
+}
+
+/**
+ * Largest `max_tokens` the Anthropic SDK permits on a **non-streaming**
+ * request. The SDK refuses to make a non-streaming call it estimates could
+ * exceed its 10-minute timeout, computed as
+ * `(60min * max_tokens) / 128_000 > 10min` — i.e. it throws
+ * `"Streaming is required for operations that may take longer than 10 minutes"`
+ * once `max_tokens > 128_000 * 10 / 60 ≈ 21_333`
+ * (`@anthropic-ai/sdk`'s `calculateNonstreamingTimeout`). The text adapter's
+ * only non-streaming call is the forced-tool `structuredOutput()` request, so
+ * its defaulted ceiling must stay at or below this; the streaming chat path
+ * keeps the model's full {@link getAnthropicDefaultMaxTokens} ceiling. We sit
+ * just under the boundary (`21_333` would round-trip to exactly 10min). This
+ * caps only the *default* — an explicit oversized `max_tokens` from the caller
+ * still surfaces the SDK's "use streaming" error, which is the correct signal.
+ */
+export const ANTHROPIC_MAX_NONSTREAMING_TOKENS = 21_000
+
+/**
+ * Resolve the default `max_tokens` for a model: its known `max_output_tokens`
+ * ceiling, or {@link ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS} for unknown models.
+ * Callers that pass an explicit `max_tokens` bypass this entirely.
+ *
+ * Pass `stream: false` for non-streaming requests (the `structuredOutput()`
+ * path): the result is then clamped to {@link ANTHROPIC_MAX_NONSTREAMING_TOKENS}
+ * so the defaulted ceiling doesn't trip the SDK's non-streaming 10-minute guard
+ * (issue #849). Streaming requests (the default) are unaffected and get the
+ * model's full ceiling.
+ */
+export function getAnthropicDefaultMaxTokens(
+  model: string,
+  { stream = true }: { stream?: boolean } = {},
+): number {
+  const ceiling =
+    ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS[model] ??
+    ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS
+  return stream ? ceiling : Math.min(ceiling, ANTHROPIC_MAX_NONSTREAMING_TOKENS)
+}
+
 /**
  * Anthropic models that support combining `tools` + JSON-Schema-constrained
  * output in a single streaming Messages request (per issue #605). GA'd

diff --git a/packages/ai-anthropic/tests/anthropic-adapter.test.ts b/packages/ai-anthropic/tests/anthropic-adapter.test.ts
@@ -2,6 +2,7 @@ import { describe, it, expect, beforeEach, vi } from 'vitest'
 import { chat, type Tool, type StreamChunk } from '@tanstack/ai'
 import { AnthropicTextAdapter } from '../src/adapters/text'
 import type { AnthropicTextProviderOptions } from '../src/adapters/text'
+import { ANTHROPIC_MAX_NONSTREAMING_TOKENS } from '../src/model-meta'
 import { z } from 'zod'
 
 const mocks = vi.hoisted(() => {
@@ -444,7 +445,7 @@ describe('Anthropic adapter option mapping', () => {
     expect(payload.top_p).toBe(0.7)
   })
 
-  it('defaults max_tokens to 1024 when not provided via modelOptions', async () => {
+  it("defaults max_tokens to the model's max_output_tokens when not provided via modelOptions (#849)", async () => {
     mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok'))
 
     const adapter = createAdapter('claude-3-7-sonnet')
@@ -457,7 +458,135 @@ describe('Anthropic adapter option mapping', () => {
     }
 
     const [payload] = mocks.betaMessagesCreate.mock.calls[0]!
-    expect(payload.max_tokens).toBe(1024)
+    // claude-3-7-sonnet's model-meta max_output_tokens is 64_000 — not the old
+    // hard-coded 1024 floor that silently truncated long responses.
+    expect(payload.max_tokens).toBe(64_000)
+  })
+
+  it('warns when the default max_tokens cap truncates the response (#849)', async () => {
+    // Stream that ends with stop_reason: "max_tokens" — the model hit the cap.
+    const truncatedStream = (async function* () {
+      yield {
+        type: 'content_block_start',
+        index: 0,
+        content_block: { type: 'text', text: '' },
+      }
+      yield {
+        type: 'content_block_delta',
+        index: 0,
+        delta: { type: 'text_delta', text: 'partial output' },
+      }
+      yield { type: 'content_block_stop', index: 0 }
+      yield {
+        type: 'message_delta',
+        delta: { stop_reason: 'max_tokens' },
+        usage: { output_tokens: 64_000 },
+      }
+      yield { type: 'message_stop' }
+    })()
+    mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream)
+
+    const adapter = createAdapter('claude-3-7-sonnet')
+
+    const logger = {
+      debug: vi.fn(),
+      info: vi.fn(),
+      warn: vi.fn(),
+      error: vi.fn(),
+    }
+
+    for await (const _ of chat({
+      adapter,
+      messages: [{ role: 'user', content: 'Write a long essay' }],
+      debug: { logger, errors: true },
+    })) {
+      // consume stream
+    }
+
+    const truncationWarning = logger.warn.mock.calls.find((call) =>
+      String(call[0]).includes('truncated at the default max_tokens'),
+    )
+    expect(truncationWarning).toBeDefined()
+  })
+
+  it('does not warn about truncation when the caller set max_tokens explicitly (#849)', async () => {
+    const truncatedStream = (async function* () {
+      yield {
+        type: 'message_delta',
+        delta: { stop_reason: 'max_tokens' },
+        usage: { output_tokens: 100 },
+      }
+      yield { type: 'message_stop' }
+    })()
+    mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream)
+
+    const adapter = createAdapter('claude-3-7-sonnet')
+
+    const logger = {
+      debug: vi.fn(),
+      info: vi.fn(),
+      warn: vi.fn(),
+      error: vi.fn(),
+    }
+
+    for await (const _ of chat({
+      adapter,
+      messages: [{ role: 'user', content: 'Hi' }],
+      modelOptions: { max_tokens: 100 } satisfies AnthropicTextProviderOptions,
+      debug: { logger, errors: true },
+    })) {
+      // consume stream
+    }
+
+    const truncationWarning = logger.warn.mock.calls.find((call) =>
+      String(call[0]).includes('truncated at the default max_tokens'),
+    )
+    expect(truncationWarning).toBeUndefined()
+  })
+
+  it('clamps the default max_tokens on the non-streaming structured-output path so it never trips the SDK 10-minute guard (#849)', async () => {
+    // The structured-output fallback issues a NON-streaming
+    // `messages.create({ stream: false })`. The Anthropic SDK throws
+    // "Streaming is required for operations that may take longer than 10
+    // minutes" once max_tokens exceeds ~21_333, so the defaulted ceiling must
+    // be clamped here even though the streaming chat path keeps the full 64K.
+    mocks.betaMessagesCreate.mockResolvedValueOnce({
+      id: 'msg_structured',
+      type: 'message',
+      role: 'assistant',
+      model: 'claude-3-7-sonnet',
+      content: [
+        {
+          type: 'tool_use',
+          id: 'toolu_structured_output',
+          name: 'structured_output',
+          input: { recommendation: 'Strat', price: 1299 },
+        },
+      ],
+      stop_reason: 'tool_use',
+      usage: { input_tokens: 10, output_tokens: 20 },
+    })
+
+    const adapter = createAdapter('claude-3-7-sonnet')
+
+    for await (const _ of chat({
+      adapter,
+      messages: [{ role: 'user', content: 'recommend a guitar as json' }],
+      outputSchema: z.object({
+        recommendation: z.string(),
+        price: z.number(),
+      }),
+      stream: true,
+    })) {
+      // consume stream
+    }
+
+    const [payload] = mocks.betaMessagesCreate.mock.calls[0]!
+    expect(payload.stream).toBe(false)
+    // Clamped to the non-streaming limit — NOT claude-3-7-sonnet's full 64K
+    // streaming ceiling, which would make the SDK throw before the request.
+    expect(payload.max_tokens).toBe(ANTHROPIC_MAX_NONSTREAMING_TOKENS)
+    expect(payload.max_tokens).toBeLessThanOrEqual(21_333)
   })
 
   it('native combined mode (#605): wires outputSchema into output_format alongside tools on Claude 4.5+', async () => {