Skip to content

ai-anthropic: max_tokens defaults to 1024, silently truncating responses when caller doesn't set it #849

Description

@tombeckenham

Summary

@tanstack/ai-anthropic's text adapter defaults the Anthropic max_tokens request field to 1024 when the caller doesn't pass one. For any non-trivial generation (codegen, agentic tool flows, long-form output) this silently truncates the response: the request comes back with stop_reason: "max_tokens", often mid-tool-call, so the run looks like it "failed to do anything" rather than "ran out of output budget".

This is Anthropic-specific. Anthropic's Messages API requires max_tokens, so the adapter must send some value — but 1024 is far below what the targeted models can produce (Sonnet/Opus support 64K–128K output), and the package already knows each model's real ceiling.

Where

packages/ai-anthropic/src/adapters/text.ts:423

const defaultMaxTokens = modelOptions?.max_tokens ?? 1024

Why 1024 is the wrong default

  • It's a ceiling, not a reservation. Billing is on tokens actually generated, so a higher default costs nothing unless the model genuinely produces more. The only effect of a low default is truncation.
  • The data for a better default already exists. packages/ai-anthropic/src/model-meta.ts carries max_output_tokens per model (e.g. 128_000, 32_000). The default ignores it and hard-codes 1024.
  • It's inconsistent across adapters. @tanstack/ai-openai has no equivalent ?? 1024 floor (OpenAI treats max_tokens as optional and defaults to the model max), so callers only get bitten on Anthropic.
  • The failure mode is opaque. It surfaces as a confusing incomplete/failed agent run, not an obvious "you hit max_tokens" — even though the adapter already has a case "max_tokens" branch and therefore knows it truncated.

Reproduction

Call chat({ adapter: anthropicText('claude-sonnet-4-5'), ... }) (or createAnthropicChat) with a prompt that asks the model to write a file / produce output longer than ~1024 tokens, and don't set maxTokens. The stream ends early with stop_reason: "max_tokens" and the tool/file output is cut off mid-stream.

Setting maxTokens explicitly on the chat() call works around it, which confirms the default is the cause.

Proposed fix

  1. Default max_tokens from ModelMeta.max_output_tokens for the resolved model when the caller doesn't specify one, falling back to a sane constant only for unknown models. (Optionally cap to a reasonable ceiling so an unspecified call can't accidentally request the full 128K — though since it's a ceiling-not-reservation and large values should be streamed, defaulting to the model max is defensible.)
  2. When a response stops on stop_reason: "max_tokens" while using the defaulted (caller-unspecified) cap, emit a warning so truncation isn't silent.

Happy to open a PR with the model-meta-aware default + truncation warning (plus the docs/skill updates the repo conventions call for). Wanted to confirm the approach in an issue first.

Environment

  • @tanstack/ai-anthropic — present on main (v0.15.8) and on the published 0.10.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions