-
-
Notifications
You must be signed in to change notification settings - Fork 258
fix(anthropic): default max_tokens to the model's output ceiling (#849) #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| --- | ||
| '@tanstack/ai-anthropic': patch | ||
| --- | ||
|
|
||
| Default Anthropic `max_tokens` to the selected model's real output ceiling | ||
| (`max_output_tokens` from model metadata — e.g. 64K for Sonnet, 128K for Opus) | ||
| when the caller doesn't pass one, instead of a hard-coded `1024` that silently | ||
| truncated long responses with `stop_reason: "max_tokens"` (#849). Unknown | ||
| models fall back to a safe constant. `max_tokens` is a ceiling, not a | ||
| reservation, so this costs nothing unless the model genuinely produces more. | ||
|
|
||
| The adapter also now logs a warning when a response is truncated while using the | ||
| defaulted (caller-unspecified) cap, so the truncation isn't silently attributed | ||
| to the model "doing nothing". Callers that set `modelOptions.max_tokens` | ||
| explicitly are unaffected. | ||
|
|
||
| The non-streaming structured-output path (`structuredOutput()`) clamps this | ||
| default to the Anthropic SDK's non-streaming-safe limit (~21K tokens). The SDK | ||
| refuses a non-streaming request whose `max_tokens` could exceed its 10-minute | ||
| timeout, so without the clamp the full-ceiling default would make every | ||
| `chat({ outputSchema })` call on a fallback-path model throw "Streaming is | ||
| required for operations that may take longer than 10 minutes". The streaming | ||
| chat path keeps the model's full ceiling. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,6 +2,7 @@ import { describe, it, expect, beforeEach, vi } from 'vitest' | |
| import { chat, type Tool, type StreamChunk } from '@tanstack/ai' | ||
| import { AnthropicTextAdapter } from '../src/adapters/text' | ||
| import type { AnthropicTextProviderOptions } from '../src/adapters/text' | ||
| import { ANTHROPIC_MAX_NONSTREAMING_TOKENS } from '../src/model-meta' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Reorder import to satisfy ESLint flags this: the value import of 🧰 Tools🪛 ESLint[error] 5-5: (import/order) 🤖 Prompt for AI AgentsSource: Linters/SAST tools |
||
| import { z } from 'zod' | ||
|
|
||
| const mocks = vi.hoisted(() => { | ||
|
|
@@ -444,7 +445,7 @@ describe('Anthropic adapter option mapping', () => { | |
| expect(payload.top_p).toBe(0.7) | ||
| }) | ||
|
|
||
| it('defaults max_tokens to 1024 when not provided via modelOptions', async () => { | ||
| it("defaults max_tokens to the model's max_output_tokens when not provided via modelOptions (#849)", async () => { | ||
| mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok')) | ||
|
|
||
| const adapter = createAdapter('claude-3-7-sonnet') | ||
|
|
@@ -457,7 +458,135 @@ describe('Anthropic adapter option mapping', () => { | |
| } | ||
|
|
||
| const [payload] = mocks.betaMessagesCreate.mock.calls[0]! | ||
| expect(payload.max_tokens).toBe(1024) | ||
| // claude-3-7-sonnet's model-meta max_output_tokens is 64_000 — not the old | ||
| // hard-coded 1024 floor that silently truncated long responses. | ||
| expect(payload.max_tokens).toBe(64_000) | ||
| }) | ||
|
|
||
| it('warns when the default max_tokens cap truncates the response (#849)', async () => { | ||
| // Stream that ends with stop_reason: "max_tokens" — the model hit the cap. | ||
| const truncatedStream = (async function* () { | ||
| yield { | ||
| type: 'content_block_start', | ||
| index: 0, | ||
| content_block: { type: 'text', text: '' }, | ||
| } | ||
| yield { | ||
| type: 'content_block_delta', | ||
| index: 0, | ||
| delta: { type: 'text_delta', text: 'partial output' }, | ||
| } | ||
| yield { type: 'content_block_stop', index: 0 } | ||
| yield { | ||
| type: 'message_delta', | ||
| delta: { stop_reason: 'max_tokens' }, | ||
| usage: { output_tokens: 64_000 }, | ||
| } | ||
| yield { type: 'message_stop' } | ||
| })() | ||
| mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream) | ||
|
|
||
| const adapter = createAdapter('claude-3-7-sonnet') | ||
|
|
||
| const logger = { | ||
| debug: vi.fn(), | ||
| info: vi.fn(), | ||
| warn: vi.fn(), | ||
| error: vi.fn(), | ||
| } | ||
|
|
||
| for await (const _ of chat({ | ||
| adapter, | ||
| messages: [{ role: 'user', content: 'Write a long essay' }], | ||
| debug: { logger, errors: true }, | ||
| })) { | ||
| // consume stream | ||
| } | ||
|
|
||
| const truncationWarning = logger.warn.mock.calls.find((call) => | ||
| String(call[0]).includes('truncated at the default max_tokens'), | ||
| ) | ||
| expect(truncationWarning).toBeDefined() | ||
| }) | ||
|
|
||
| it('does not warn about truncation when the caller set max_tokens explicitly (#849)', async () => { | ||
| const truncatedStream = (async function* () { | ||
| yield { | ||
| type: 'message_delta', | ||
| delta: { stop_reason: 'max_tokens' }, | ||
| usage: { output_tokens: 100 }, | ||
| } | ||
| yield { type: 'message_stop' } | ||
| })() | ||
| mocks.betaMessagesCreate.mockResolvedValueOnce(truncatedStream) | ||
|
|
||
| const adapter = createAdapter('claude-3-7-sonnet') | ||
|
|
||
| const logger = { | ||
| debug: vi.fn(), | ||
| info: vi.fn(), | ||
| warn: vi.fn(), | ||
| error: vi.fn(), | ||
| } | ||
|
|
||
| for await (const _ of chat({ | ||
| adapter, | ||
| messages: [{ role: 'user', content: 'Hi' }], | ||
| modelOptions: { max_tokens: 100 } satisfies AnthropicTextProviderOptions, | ||
| debug: { logger, errors: true }, | ||
| })) { | ||
| // consume stream | ||
| } | ||
|
|
||
| const truncationWarning = logger.warn.mock.calls.find((call) => | ||
| String(call[0]).includes('truncated at the default max_tokens'), | ||
| ) | ||
| expect(truncationWarning).toBeUndefined() | ||
| }) | ||
|
|
||
| it('clamps the default max_tokens on the non-streaming structured-output path so it never trips the SDK 10-minute guard (#849)', async () => { | ||
| // The structured-output fallback issues a NON-streaming | ||
| // `messages.create({ stream: false })`. The Anthropic SDK throws | ||
| // "Streaming is required for operations that may take longer than 10 | ||
| // minutes" once max_tokens exceeds ~21_333, so the defaulted ceiling must | ||
| // be clamped here even though the streaming chat path keeps the full 64K. | ||
| mocks.betaMessagesCreate.mockResolvedValueOnce({ | ||
| id: 'msg_structured', | ||
| type: 'message', | ||
| role: 'assistant', | ||
| model: 'claude-3-7-sonnet', | ||
| content: [ | ||
| { | ||
| type: 'tool_use', | ||
| id: 'toolu_structured_output', | ||
| name: 'structured_output', | ||
| input: { recommendation: 'Strat', price: 1299 }, | ||
| }, | ||
| ], | ||
| stop_reason: 'tool_use', | ||
| usage: { input_tokens: 10, output_tokens: 20 }, | ||
| }) | ||
|
|
||
| const adapter = createAdapter('claude-3-7-sonnet') | ||
|
|
||
| for await (const _ of chat({ | ||
| adapter, | ||
| messages: [{ role: 'user', content: 'recommend a guitar as json' }], | ||
| outputSchema: z.object({ | ||
| recommendation: z.string(), | ||
| price: z.number(), | ||
| }), | ||
| stream: true, | ||
| })) { | ||
| // consume stream | ||
| } | ||
|
|
||
| const [payload] = mocks.betaMessagesCreate.mock.calls[0]! | ||
| expect(payload.stream).toBe(false) | ||
| // Clamped to the non-streaming limit — NOT claude-3-7-sonnet's full 64K | ||
| // streaming ceiling, which would make the SDK throw before the request. | ||
| expect(payload.max_tokens).toBe(ANTHROPIC_MAX_NONSTREAMING_TOKENS) | ||
| expect(payload.max_tokens).toBeLessThanOrEqual(21_333) | ||
| }) | ||
|
|
||
| it('native combined mode (#605): wires outputSchema into output_format alongside tools on Claude 4.5+', async () => { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Fix heading level to maintain document hierarchy.
The new
####max_tokensdefaultheading skips a level. It follows## Model Optionsand precedes### Thinking (Extended Thinking), so it should be###max_tokensdefaultto increment by one level at a time.📝 Proposed fix
📝 Committable suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 139-139: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4
(MD001, heading-increment)
🤖 Prompt for AI Agents
Source: Linters/SAST tools