Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .changeset/anthropic-max-tokens-default.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
'@tanstack/ai-anthropic': patch
---

Default Anthropic `max_tokens` to the selected model's real output ceiling
(`max_output_tokens` from model metadata — e.g. 64K for Sonnet, 128K for Opus)
when the caller doesn't pass one, instead of a hard-coded `1024` that silently
truncated long responses with `stop_reason: "max_tokens"` (#849). Unknown
models fall back to a safe constant. `max_tokens` is a ceiling, not a
reservation, so this costs nothing unless the model genuinely produces more.

The adapter also now logs a warning when a response is truncated while using the
defaulted (caller-unspecified) cap, so the truncation isn't silently attributed
to the model "doing nothing". Callers that set `modelOptions.max_tokens`
explicitly are unaffected.

The non-streaming structured-output path (`structuredOutput()`) clamps this
default to the Anthropic SDK's non-streaming-safe limit (~21K tokens). The SDK
refuses a non-streaming request whose `max_tokens` could exceed its 10-minute
timeout, so without the clamp the full-ceiling default would make every
`chat({ outputSchema })` call on a fallback-path model throw "Streaming is
required for operations that may take longer than 10 minutes". The streaming
chat path keeps the model's full ceiling.
6 changes: 6 additions & 0 deletions docs/adapters/anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,12 @@ const stream = chat({

> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options).

#### `max_tokens` default

Anthropic's Messages API _requires_ `max_tokens` on every request, so the adapter always sends a value. When you don't set `modelOptions.max_tokens`, it defaults to the selected model's full output ceiling (`max_output_tokens` from the model metadata — e.g. 64K for Sonnet, 128K for Opus), falling back to a safe constant for unrecognized models. `max_tokens` is a ceiling, not a reservation — billing is on tokens actually generated — so this default costs nothing extra and avoids the silent mid-response truncation (`stop_reason: "max_tokens"`) that a low default would cause. Set `max_tokens` explicitly only when you want to _cap_ output below the model ceiling. If a response is truncated while using the default cap, the adapter logs a warning (visible with [debug logging](../advanced/debug-logging) enabled).

Comment on lines +139 to +142

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Fix heading level to maintain document hierarchy.

The new #### max_tokens default heading skips a level. It follows ## Model Options and precedes ### Thinking (Extended Thinking), so it should be ### max_tokens default to increment by one level at a time.

📝 Proposed fix
-#### `max_tokens` default
+### `max_tokens` default
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#### `max_tokens` default
Anthropic's Messages API _requires_ `max_tokens` on every request, so the adapter always sends a value. When you don't set `modelOptions.max_tokens`, it defaults to the selected model's full output ceiling (`max_output_tokens` from the model metadata — e.g. 64K for Sonnet, 128K for Opus), falling back to a safe constant for unrecognized models. `max_tokens` is a ceiling, not a reservation — billing is on tokens actually generated — so this default costs nothing extra and avoids the silent mid-response truncation (`stop_reason: "max_tokens"`) that a low default would cause. Set `max_tokens` explicitly only when you want to _cap_ output below the model ceiling. If a response is truncated while using the default cap, the adapter logs a warning (visible with [debug logging](../advanced/debug-logging) enabled).
### `max_tokens` default
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 139-139: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/adapters/anthropic.md` around lines 139 - 142, The `max_tokens` default
section in the Anthropic adapter docs skips a heading level and breaks the
document hierarchy. Update the heading used for this section in
`docs/adapters/anthropic.md` from the current `####` level to `###` so it sits
correctly between `Model Options` and `Thinking (Extended Thinking)`.

Source: Linters/SAST tools

One exception: structured output (`chat({ outputSchema })`) on models that use the non-streaming finalization path clamps this default to ~21K tokens. The Anthropic SDK rejects a non-streaming request whose `max_tokens` could exceed its 10-minute timeout, so the full ceiling can't be used there. Streaming chat is unaffected. To raise the structured-output ceiling toward a model's true max, stream the response.

### Thinking (Extended Thinking)

Enable extended thinking with a token budget. This allows Claude to show its reasoning process, which is streamed as `thinking` chunks:
Expand Down
3 changes: 2 additions & 1 deletion docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,8 @@
{
"label": "Anthropic",
"to": "adapters/anthropic",
"addedAt": "2026-04-15"
"addedAt": "2026-04-15",
"updatedAt": "2026-06-26"
},
{
"label": "Google Gemini",
Expand Down
42 changes: 39 additions & 3 deletions packages/ai-anthropic/src/adapters/text.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ import {
generateId,
getAnthropicApiKeyFromEnv,
} from '../utils'
import { ANTHROPIC_COMBINED_TOOLS_AND_SCHEMA_MODELS } from '../model-meta'
import {
ANTHROPIC_COMBINED_TOOLS_AND_SCHEMA_MODELS,
getAnthropicDefaultMaxTokens,
} from '../model-meta'
import type {
ANTHROPIC_MODELS,
AnthropicChatModelProviderOptionsByName,
Expand Down Expand Up @@ -263,7 +266,12 @@ export class AnthropicTextAdapter<
const { chatOptions, outputSchema } = options
const { logger } = chatOptions

const requestParams = this.mapCommonOptionsToAnthropic(chatOptions)
// `structuredOutput()` issues a non-streaming `messages.create({ stream:
// false })` below, so the defaulted `max_tokens` must stay under the SDK's
// non-streaming 10-minute guard (issue #849) — pass `stream: false`.
const requestParams = this.mapCommonOptionsToAnthropic(chatOptions, {
stream: false,
})

// Create a tool that will capture the structured output
// Anthropic's SDK requires input_schema with type: 'object' literal
Expand Down Expand Up @@ -352,6 +360,7 @@ export class AnthropicTextAdapter<

private mapCommonOptionsToAnthropic(
options: TextOptions<AnthropicTextProviderOptions>,
{ stream = true }: { stream?: boolean } = {},
) {
const modelOptions = options.modelOptions

Expand Down Expand Up @@ -420,7 +429,18 @@ export class AnthropicTextAdapter<
validProviderOptions.thinking?.type === 'enabled'
? validProviderOptions.thinking.budget_tokens
: undefined
const defaultMaxTokens = modelOptions?.max_tokens ?? 1024
// Anthropic's Messages API *requires* `max_tokens`, so we must always send a
// value. When the caller doesn't specify one, default to the resolved
// model's real output ceiling (from model-meta) rather than a low constant
// that silently truncates long responses with `stop_reason: "max_tokens"`
// (issue #849). `max_tokens` is a ceiling, not a reservation — billing is on
// tokens actually generated, so a higher default costs nothing extra.
// For non-streaming requests (the `structuredOutput()` path) the default is
// clamped to the SDK's non-streaming-safe limit so it doesn't trip the
// "streaming required" 10-minute guard — see getAnthropicDefaultMaxTokens.
const defaultMaxTokens =
modelOptions?.max_tokens ??
getAnthropicDefaultMaxTokens(this.model, { stream })
const maxTokens =
thinkingBudget && thinkingBudget >= defaultMaxTokens
? thinkingBudget + 1
Expand Down Expand Up @@ -1181,6 +1201,22 @@ export class AnthropicTextAdapter<
break
}
case 'max_tokens': {
// Surface a warning when the truncating cap was the
// adapter-supplied default (caller didn't pass `max_tokens`), so
// the truncation isn't silently attributed to the model "doing
// nothing" (issue #849). When the caller set `max_tokens`
// themselves, hitting it is their own deliberate ceiling.
if (options.modelOptions?.max_tokens == null) {
const defaultedMaxTokens = getAnthropicDefaultMaxTokens(model)
logger.warn(
`anthropic response truncated at the default max_tokens (${defaultedMaxTokens}) for model=${model}; pass maxTokens (or modelOptions.max_tokens) to raise the output ceiling`,
{
source: 'anthropic.processAnthropicStream',
model,
defaultedMaxTokens,
},
)
}
yield {
type: EventType.RUN_ERROR,
model,
Expand Down
81 changes: 81 additions & 0 deletions packages/ai-anthropic/src/model-meta.ts
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,87 @@ export const ANTHROPIC_MODELS = [
CLAUDE_OPUS_4_8_FAST.id,
] as const

/**
* Fallback `max_tokens` ceiling for a model whose metadata carries no
* `max_output_tokens` (e.g. an unrecognized model id). Anthropic's Messages
* API *requires* `max_tokens`, so the adapter must always send a value. 64K is
* the output ceiling of the current mainstream Claude tier (Sonnet/Haiku 4.5),
* so it's a sane default for an unknown — almost certainly modern — model and
* avoids silently truncating long generations (issue #849). Recognized models
* use their exact `max_output_tokens` from {@link ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS}
* (e.g. 128K for Opus), so this fallback only ever applies to ids not in the
* map.
*/
export const ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS = 64_000

/**
* Runtime lookup of each model's maximum output-token ceiling, keyed by model
* id. Lets the text adapter default the required `max_tokens` request field to
* the model's real ceiling when the caller doesn't specify one, rather than a
* low constant that truncates responses mid-stream (issue #849).
*
* Kept in sync with {@link ANTHROPIC_MODELS} by `scripts/sync-provider-models.ts`
* — when that script adds a model it also inserts the model's `max_output_tokens`
* here, so a freshly-synced model resolves to its real ceiling rather than the
* fallback above.
*/
const ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS: Record<string, number> = {
[CLAUDE_OPUS_4_6.id]: CLAUDE_OPUS_4_6.max_output_tokens,
[CLAUDE_OPUS_4_5.id]: CLAUDE_OPUS_4_5.max_output_tokens,
[CLAUDE_SONNET_4_6.id]: CLAUDE_SONNET_4_6.max_output_tokens,
[CLAUDE_SONNET_4_5.id]: CLAUDE_SONNET_4_5.max_output_tokens,
[CLAUDE_HAIKU_4_5.id]: CLAUDE_HAIKU_4_5.max_output_tokens,
[CLAUDE_OPUS_4_1.id]: CLAUDE_OPUS_4_1.max_output_tokens,
[CLAUDE_SONNET_4.id]: CLAUDE_SONNET_4.max_output_tokens,
[CLAUDE_SONNET_3_7.id]: CLAUDE_SONNET_3_7.max_output_tokens,
[CLAUDE_OPUS_4.id]: CLAUDE_OPUS_4.max_output_tokens,
[CLAUDE_HAIKU_3_5.id]: CLAUDE_HAIKU_3_5.max_output_tokens,
[CLAUDE_HAIKU_3.id]: CLAUDE_HAIKU_3.max_output_tokens,
[CLAUDE_OPUS_4_6_FAST.id]: CLAUDE_OPUS_4_6_FAST.max_output_tokens,
[CLAUDE_OPUS_4_7.id]: CLAUDE_OPUS_4_7.max_output_tokens,
[CLAUDE_OPUS_4_7_FAST.id]: CLAUDE_OPUS_4_7_FAST.max_output_tokens,
[CLAUDE_OPUS_4_8.id]: CLAUDE_OPUS_4_8.max_output_tokens,
[CLAUDE_OPUS_4_8_FAST.id]: CLAUDE_OPUS_4_8_FAST.max_output_tokens,
}

/**
* Largest `max_tokens` the Anthropic SDK permits on a **non-streaming**
* request. The SDK refuses to make a non-streaming call it estimates could
* exceed its 10-minute timeout, computed as
* `(60min * max_tokens) / 128_000 > 10min` — i.e. it throws
* `"Streaming is required for operations that may take longer than 10 minutes"`
* once `max_tokens > 128_000 * 10 / 60 ≈ 21_333`
* (`@anthropic-ai/sdk`'s `calculateNonstreamingTimeout`). The text adapter's
* only non-streaming call is the forced-tool `structuredOutput()` request, so
* its defaulted ceiling must stay at or below this; the streaming chat path
* keeps the model's full {@link getAnthropicDefaultMaxTokens} ceiling. We sit
* just under the boundary (`21_333` would round-trip to exactly 10min). This
* caps only the *default* — an explicit oversized `max_tokens` from the caller
* still surfaces the SDK's "use streaming" error, which is the correct signal.
*/
export const ANTHROPIC_MAX_NONSTREAMING_TOKENS = 21_000

/**
* Resolve the default `max_tokens` for a model: its known `max_output_tokens`
* ceiling, or {@link ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS} for unknown models.
* Callers that pass an explicit `max_tokens` bypass this entirely.
*
* Pass `stream: false` for non-streaming requests (the `structuredOutput()`
* path): the result is then clamped to {@link ANTHROPIC_MAX_NONSTREAMING_TOKENS}
* so the defaulted ceiling doesn't trip the SDK's non-streaming 10-minute guard
* (issue #849). Streaming requests (the default) are unaffected and get the
* model's full ceiling.
*/
export function getAnthropicDefaultMaxTokens(
model: string,
{ stream = true }: { stream?: boolean } = {},
): number {
const ceiling =
ANTHROPIC_MODEL_MAX_OUTPUT_TOKENS[model] ??
ANTHROPIC_DEFAULT_MAX_OUTPUT_TOKENS
return stream ? ceiling : Math.min(ceiling, ANTHROPIC_MAX_NONSTREAMING_TOKENS)
}

/**
* Anthropic models that support combining `tools` + JSON-Schema-constrained
* output in a single streaming Messages request (per issue #605). GA'd
Expand Down
133 changes: 131 additions & 2 deletions packages/ai-anthropic/tests/anthropic-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { describe, it, expect, beforeEach, vi } from 'vitest'
import { chat, type Tool, type StreamChunk } from '@tanstack/ai'
import { AnthropicTextAdapter } from '../src/adapters/text'
import type { AnthropicTextProviderOptions } from '../src/adapters/text'
import { ANTHROPIC_MAX_NONSTREAMING_TOKENS } from '../src/model-meta'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Reorder import to satisfy import/order.

ESLint flags this: the value import of ../src/model-meta should precede the type import of ../src/adapters/text. This will fail lint in CI.

🧰 Tools
🪛 ESLint

[error] 5-5: ../src/model-meta import should occur before type import of ../src/adapters/text

(import/order)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-anthropic/tests/anthropic-adapter.test.ts` at line 5, The imports
in anthropic-adapter.test.ts are out of the required order and will fail the
import/order lint rule. Reorder the value import from model-meta so it comes
before the type-only import from adapters/text, keeping the existing symbols
like ANTHROPIC_MAX_NONSTREAMING_TOKENS and the TextAdapter type import intact.

Source: Linters/SAST tools

import { z } from 'zod'

const mocks = vi.hoisted(() => {
Expand Down Expand Up @@ -444,7 +445,7 @@ describe('Anthropic adapter option mapping', () => {
expect(payload.top_p).toBe(0.7)
})

it('defaults max_tokens to 1024 when not provided via modelOptions', async () => {
it("defaults max_tokens to the model's max_output_tokens when not provided via modelOptions (#849)", async () => {
mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok'))

const adapter = createAdapter('claude-3-7-sonnet')
Expand All @@ -457,7 +458,135 @@ describe('Anthropic adapter option mapping', () => {
}

const [payload] = mocks.betaMessagesCreate.mock.calls[0]!
expect(payload.max_tokens).toBe(1024)
// claude-3-7-sonnet's model-meta max_output_tokens is 64_000 — not the old
// hard-coded 1024 floor that silently truncated long responses.
expect(payload.max_tokens).toBe(64_000)
})

it('warns when the default max_tokens cap truncates the response (#849)', async () => {
// Stream that ends with stop_reason: "max_tokens" — the model hit the cap.
const truncatedStream = (async function* () {
yield {
type: 'content_block_start',
index: 0,
content_block: { type: 'text', text: '' },
}
yield {
type: 'content_block_delta',
index: 0,
delta: { type: 'text_delta', text: 'partial output' },
}
yield { type: 'content_block_stop', index: 0 }
yield {
type: 'message_delta',
delta: { stop_reason: 'max_tokens' },
usage: { output_tokens: 64_000 },
}
yield { type: 'message_stop' }
})()
mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream)

const adapter = createAdapter('claude-3-7-sonnet')

const logger = {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
}

for await (const _ of chat({
adapter,
messages: [{ role: 'user', content: 'Write a long essay' }],
debug: { logger, errors: true },
})) {
// consume stream
}

const truncationWarning = logger.warn.mock.calls.find((call) =>
String(call[0]).includes('truncated at the default max_tokens'),
)
expect(truncationWarning).toBeDefined()
})

it('does not warn about truncation when the caller set max_tokens explicitly (#849)', async () => {
const truncatedStream = (async function* () {
yield {
type: 'message_delta',
delta: { stop_reason: 'max_tokens' },
usage: { output_tokens: 100 },
}
yield { type: 'message_stop' }
})()
mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream)

const adapter = createAdapter('claude-3-7-sonnet')

const logger = {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
}

for await (const _ of chat({
adapter,
messages: [{ role: 'user', content: 'Hi' }],
modelOptions: { max_tokens: 100 } satisfies AnthropicTextProviderOptions,
debug: { logger, errors: true },
})) {
// consume stream
}

const truncationWarning = logger.warn.mock.calls.find((call) =>
String(call[0]).includes('truncated at the default max_tokens'),
)
expect(truncationWarning).toBeUndefined()
})

it('clamps the default max_tokens on the non-streaming structured-output path so it never trips the SDK 10-minute guard (#849)', async () => {
// The structured-output fallback issues a NON-streaming
// `messages.create({ stream: false })`. The Anthropic SDK throws
// "Streaming is required for operations that may take longer than 10
// minutes" once max_tokens exceeds ~21_333, so the defaulted ceiling must
// be clamped here even though the streaming chat path keeps the full 64K.
mocks.betaMessagesCreate.mockResolvedValueOnce({
id: 'msg_structured',
type: 'message',
role: 'assistant',
model: 'claude-3-7-sonnet',
content: [
{
type: 'tool_use',
id: 'toolu_structured_output',
name: 'structured_output',
input: { recommendation: 'Strat', price: 1299 },
},
],
stop_reason: 'tool_use',
usage: { input_tokens: 10, output_tokens: 20 },
})

const adapter = createAdapter('claude-3-7-sonnet')

for await (const _ of chat({
adapter,
messages: [{ role: 'user', content: 'recommend a guitar as json' }],
outputSchema: z.object({
recommendation: z.string(),
price: z.number(),
}),
stream: true,
})) {
// consume stream
}

const [payload] = mocks.betaMessagesCreate.mock.calls[0]!
expect(payload.stream).toBe(false)
// Clamped to the non-streaming limit — NOT claude-3-7-sonnet's full 64K
// streaming ceiling, which would make the SDK throw before the request.
expect(payload.max_tokens).toBe(ANTHROPIC_MAX_NONSTREAMING_TOKENS)
expect(payload.max_tokens).toBeLessThanOrEqual(21_333)
})

it('native combined mode (#605): wires outputSchema into output_format alongside tools on Claude 4.5+', async () => {
Expand Down
Loading
Loading