Skip to content

feat: add tool_not_found error handler to recover from hallucinated tool calls (#325)#2957

Draft
sjhddh wants to merge 1 commit intoopenai:mainfrom
sjhddh:sunny/issue-325-tool-not-found-handler
Draft

feat: add tool_not_found error handler to recover from hallucinated tool calls (#325)#2957
sjhddh wants to merge 1 commit intoopenai:mainfrom
sjhddh:sunny/issue-325-tool-not-found-handler

Conversation

@sjhddh
Copy link
Copy Markdown

@sjhddh sjhddh commented Apr 19, 2026

Summary

Fixes #325. When the model calls a tool that isn't registered on the agent, the SDK currently raises ModelBehaviorError and kills the run, discarding every turn that came before. Users on the 2+-year-old issue report losing multi-minute DeepSearch-style runs to a single typo. This PR adds a recoverable escape hatch.

What's new

Register a handler alongside the existing max_turns one:

def on_tool_not_found(data: ToolNotFoundErrorHandlerInput[None]) -> ToolNotFoundAction:
    return ToolNotFoundAction(
        error_message=f"Tool {data.tool_name!r} does not exist. Available: {data.available_tools}."
    )

Runner.run_sync(agent, "...", error_handlers={"tool_not_found": on_tool_not_found})

The runner pre-scans the model response for unknown tool calls, invokes the handler (sync or async) once per unknown call, synthesizes a function_call_output with the handler's message, and continues the turn. The model sees the error on its next step and self-corrects.

Contract

  • Return ToolNotFoundAction(error_message=...) to recover; return None or register nothing to preserve today's raise behavior.
  • Recovery is bounded by max_turns, so repeated hallucinations still terminate.
  • LiteLLM's json_tool_call structured-output pseudo-call is not treated as missing.
  • The handler's own exceptions propagate (buggy handlers must surface, not be swallowed).
  • Span errors are attached only when we actually raise; successful recoveries don't pollute traces.

ToolNotFoundAction is a dataclass with one field today (error_message) by design — a wrapper lets us add fields without breaking callers.

Docs & example

  • New subsection in docs/running_agents.md.
  • New runnable example at examples/basic/tool_not_found_handler.py (offline, uses a scripted model).

Test plan

  • 11 tests in tests/test_tool_not_found_handler.py (including handler-raises, multi-call batch, to_input_list round-trip, LiteLLM escape hatch)
  • make format-check, make lint, make mypy clean
  • Full suite: 3835 passed
  • Example runs end-to-end

…ted tool calls

When the model calls a tool that isn't registered on the agent, the SDK
raises ModelBehaviorError and kills the run — discarding however many
turns of work came before it. Users on issue openai#325 lost multi-minute
DeepSearch-style runs to a single bogus tool name.

This extends the existing RunErrorHandlers pattern with a new kind,
tool_not_found, that lets the caller recover by returning a
ToolNotFoundAction(error_message=...). The runner then synthesizes a
function_call_output item carrying that message and continues the turn;
the model sees the error on its next step and can retry with a valid
tool name. Returning None (or not registering a handler) preserves the
existing raise behavior.

The resolver pre-scans the model response for unknown tool calls,
invokes the user handler (sync or async) once per missing call, and
passes the resolved {call_id: ToolNotFoundAction} map into
process_model_response — which already had two raise sites for
function-tool and custom-tool lookups. The pre-scan honors the LiteLLM
structured-output escape hatch (json_tool_call under an output_schema)
so legitimate pseudo-calls don't spuriously fire the handler, and span
errors are only attached when we're actually raising (successful
recovery does not pollute traces).

Ships with docs under running_agents.md and a self-contained runnable
example at examples/basic/tool_not_found_handler.py.

Fixes openai#325
@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request feature:core labels Apr 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49ccd77fcb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2035 to +2036
if isinstance(tool, FunctionTool | CustomTool):
names.append(tool.name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use qualified names in available_tools for namespaced tools

_collect_available_tool_names builds available_tools from tool.name, which strips namespace metadata for namespaced FunctionTools. In agents using tool_namespace(...), handlers will see suggestions like search instead of finance.search; if that message is echoed back to the model, the next call is likely to use a bare name that dispatch cannot resolve (function lookup is namespace-aware), so recovery can loop on repeated tool_not_found errors. Populate this list with qualified identifiers (or include namespace explicitly) so the handler can provide runnable alternatives.

Useful? React with 👍 / 👎.

@seratch seratch marked this pull request as draft April 19, 2026 13:13
@seratch
Copy link
Copy Markdown
Member

seratch commented Apr 19, 2026

Thanks for sharing this. Providing some solutions for handling ModelBehaviorError is worth considering, but this tool-not-found pattern is not the thing we'd like to add to this SDK.

@sjhddh
Copy link
Copy Markdown
Author

sjhddh commented Apr 19, 2026

Thanks for the quick look, @seratch — I appreciate the direction.

Happy to rework if you can share a bit more on what shape you'd find acceptable. A few options I'm weighing:

  1. Generic model_behavior_error handler (covers tool-not-found, invalid JSON, missing call_id, etc.) via the existing RunErrorHandlers TypedDict — a single kind with the full ModelBehaviorError instance as input. Keeps the user-facing surface to one key.
  2. Extending ModelSettings.retry to retry ModelBehaviorError as a transient class, leaving recovery shape to the existing retry machinery.
  3. Something else entirely — e.g., an internal "inform the model and continue" default, with no new public surface at all.

If #2 or #3 is closer to what you have in mind, happy to close this and open a scoped-down version. If #1 is workable, I can pivot the existing diff. Let me know what pattern fits the SDK's direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request feature:core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retry mechanism for ModelBehaviorError

2 participants