feat: add tool_not_found error handler to recover from hallucinated tool calls (#325)#2957
feat: add tool_not_found error handler to recover from hallucinated tool calls (#325)#2957sjhddh wants to merge 1 commit intoopenai:mainfrom
Conversation
…ted tool calls When the model calls a tool that isn't registered on the agent, the SDK raises ModelBehaviorError and kills the run — discarding however many turns of work came before it. Users on issue openai#325 lost multi-minute DeepSearch-style runs to a single bogus tool name. This extends the existing RunErrorHandlers pattern with a new kind, tool_not_found, that lets the caller recover by returning a ToolNotFoundAction(error_message=...). The runner then synthesizes a function_call_output item carrying that message and continues the turn; the model sees the error on its next step and can retry with a valid tool name. Returning None (or not registering a handler) preserves the existing raise behavior. The resolver pre-scans the model response for unknown tool calls, invokes the user handler (sync or async) once per missing call, and passes the resolved {call_id: ToolNotFoundAction} map into process_model_response — which already had two raise sites for function-tool and custom-tool lookups. The pre-scan honors the LiteLLM structured-output escape hatch (json_tool_call under an output_schema) so legitimate pseudo-calls don't spuriously fire the handler, and span errors are only attached when we're actually raising (successful recovery does not pollute traces). Ships with docs under running_agents.md and a self-contained runnable example at examples/basic/tool_not_found_handler.py. Fixes openai#325
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 49ccd77fcb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if isinstance(tool, FunctionTool | CustomTool): | ||
| names.append(tool.name) |
There was a problem hiding this comment.
Use qualified names in available_tools for namespaced tools
_collect_available_tool_names builds available_tools from tool.name, which strips namespace metadata for namespaced FunctionTools. In agents using tool_namespace(...), handlers will see suggestions like search instead of finance.search; if that message is echoed back to the model, the next call is likely to use a bare name that dispatch cannot resolve (function lookup is namespace-aware), so recovery can loop on repeated tool_not_found errors. Populate this list with qualified identifiers (or include namespace explicitly) so the handler can provide runnable alternatives.
Useful? React with 👍 / 👎.
|
Thanks for sharing this. Providing some solutions for handling ModelBehaviorError is worth considering, but this tool-not-found pattern is not the thing we'd like to add to this SDK. |
|
Thanks for the quick look, @seratch — I appreciate the direction. Happy to rework if you can share a bit more on what shape you'd find acceptable. A few options I'm weighing:
If #2 or #3 is closer to what you have in mind, happy to close this and open a scoped-down version. If #1 is workable, I can pivot the existing diff. Let me know what pattern fits the SDK's direction. |
Summary
Fixes #325. When the model calls a tool that isn't registered on the agent, the SDK currently raises
ModelBehaviorErrorand kills the run, discarding every turn that came before. Users on the 2+-year-old issue report losing multi-minute DeepSearch-style runs to a single typo. This PR adds a recoverable escape hatch.What's new
Register a handler alongside the existing
max_turnsone:The runner pre-scans the model response for unknown tool calls, invokes the handler (sync or async) once per unknown call, synthesizes a
function_call_outputwith the handler's message, and continues the turn. The model sees the error on its next step and self-corrects.Contract
ToolNotFoundAction(error_message=...)to recover; returnNoneor register nothing to preserve today's raise behavior.max_turns, so repeated hallucinations still terminate.json_tool_callstructured-output pseudo-call is not treated as missing.ToolNotFoundActionis a dataclass with one field today (error_message) by design — a wrapper lets us add fields without breaking callers.Docs & example
docs/running_agents.md.examples/basic/tool_not_found_handler.py(offline, uses a scripted model).Test plan
tests/test_tool_not_found_handler.py(including handler-raises, multi-call batch,to_input_listround-trip, LiteLLM escape hatch)make format-check,make lint,make mypyclean