Skip to content

Commit 49ccd77

Browse files
committed
feat: #325 add tool_not_found error handler for model-hallucinated tool calls
When the model calls a tool that isn't registered on the agent, the SDK raises ModelBehaviorError and kills the run — discarding however many turns of work came before it. Users on issue #325 lost multi-minute DeepSearch-style runs to a single bogus tool name. This extends the existing RunErrorHandlers pattern with a new kind, tool_not_found, that lets the caller recover by returning a ToolNotFoundAction(error_message=...). The runner then synthesizes a function_call_output item carrying that message and continues the turn; the model sees the error on its next step and can retry with a valid tool name. Returning None (or not registering a handler) preserves the existing raise behavior. The resolver pre-scans the model response for unknown tool calls, invokes the user handler (sync or async) once per missing call, and passes the resolved {call_id: ToolNotFoundAction} map into process_model_response — which already had two raise sites for function-tool and custom-tool lookups. The pre-scan honors the LiteLLM structured-output escape hatch (json_tool_call under an output_schema) so legitimate pseudo-calls don't spuriously fire the handler, and span errors are only attached when we're actually raising (successful recovery does not pollute traces). Ships with docs under running_agents.md and a self-contained runnable example at examples/basic/tool_not_found_handler.py. Fixes #325
1 parent da82b2c commit 49ccd77

File tree

10 files changed

+873
-1
lines changed

10 files changed

+873
-1
lines changed

docs/running_agents.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,31 @@ print(result.final_output)
441441

442442
Set `include_in_history=False` when you do not want the fallback output appended to conversation history.
443443

444+
### Recovering from hallucinated tool calls
445+
446+
Models occasionally call a tool name that was never registered on the agent (issue [#325](https://github.com/openai/openai-agents-python/issues/325)). By default the SDK raises `ModelBehaviorError` and the run ends, discarding prior work. Register a `"tool_not_found"` handler to turn that crash into a recoverable nudge: the handler returns a [`ToolNotFoundAction`][agents.ToolNotFoundAction] with a model-visible error message, the runner injects it as a synthetic tool output, and the model self-corrects on the next turn. Returning `None` (or not registering a handler) preserves the existing raise behavior. Recovery is bounded by the run's `max_turns`, so a model that keeps hallucinating still terminates.
447+
448+
```python
449+
from agents import Agent, Runner, ToolNotFoundAction, ToolNotFoundErrorHandlerInput
450+
451+
452+
def on_tool_not_found(data: ToolNotFoundErrorHandlerInput[None]) -> ToolNotFoundAction:
453+
return ToolNotFoundAction(
454+
error_message=(
455+
f"Tool {data.tool_name!r} does not exist. Available: {data.available_tools}."
456+
)
457+
)
458+
459+
460+
result = Runner.run_sync(
461+
agent,
462+
"find me profiles related to Anthropic",
463+
error_handlers={"tool_not_found": on_tool_not_found},
464+
)
465+
```
466+
467+
See [`examples/basic/tool_not_found_handler.py`](https://github.com/openai/openai-agents-python/blob/main/examples/basic/tool_not_found_handler.py) for a full runnable example.
468+
444469
## Durable execution integrations and human-in-the-loop
445470

446471
For tool approval pause/resume patterns, start with the dedicated [Human-in-the-loop guide](human_in_the_loop.md).
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
"""Recovering from a model that calls a tool that doesn't exist.
2+
3+
Large models occasionally "hallucinate" a tool name that isn't registered on the agent --
4+
for example they call ``search_linkedin`` when only ``search_web`` is available. Without a
5+
handler, the SDK raises ``ModelBehaviorError`` and the entire run is lost.
6+
7+
Registering a ``tool_not_found`` error handler lets you turn that crash into a recoverable
8+
nudge: the handler returns a ``ToolNotFoundAction`` with an error message, the runner
9+
injects that message as a synthetic tool output, and the model self-corrects on the next
10+
turn.
11+
12+
This example uses a tiny scripted ``Model`` subclass so it runs offline -- no API key
13+
needed. See issue #325 for the real-world report that motivated this API.
14+
15+
$ python examples/basic/tool_not_found_handler.py
16+
"""
17+
18+
from __future__ import annotations
19+
20+
import asyncio
21+
from collections.abc import AsyncIterator
22+
from typing import Any
23+
24+
from openai.types.responses import ResponseFunctionToolCall, ResponseOutputMessage
25+
26+
from agents import (
27+
Agent,
28+
ModelResponse,
29+
Runner,
30+
ToolNotFoundAction,
31+
ToolNotFoundErrorHandlerInput,
32+
Usage,
33+
function_tool,
34+
)
35+
from agents.agent_output import AgentOutputSchemaBase
36+
from agents.handoffs import Handoff
37+
from agents.items import TResponseInputItem, TResponseStreamEvent
38+
from agents.model_settings import ModelSettings
39+
from agents.models.interface import Model, ModelTracing
40+
from agents.tool import Tool
41+
42+
43+
@function_tool
44+
def search_web(query: str) -> str:
45+
"""The only real tool on the agent."""
46+
return f"results for: {query}"
47+
48+
49+
class ScriptedModel(Model):
50+
"""Plays back a fixed script of model responses so the example runs offline."""
51+
52+
def __init__(self, scripted_outputs: list[list[Any]]) -> None:
53+
self._outputs = list(scripted_outputs)
54+
55+
async def get_response(self, *args: Any, **kwargs: Any) -> ModelResponse:
56+
output = self._outputs.pop(0) if self._outputs else []
57+
return ModelResponse(output=output, usage=Usage(), response_id="scripted")
58+
59+
def stream_response( # pragma: no cover - not exercised here
60+
self,
61+
system_instructions: str | None,
62+
input: str | list[TResponseInputItem],
63+
model_settings: ModelSettings,
64+
tools: list[Tool],
65+
output_schema: AgentOutputSchemaBase | None,
66+
handoffs: list[Handoff],
67+
tracing: ModelTracing,
68+
*,
69+
previous_response_id: str | None = None,
70+
conversation_id: str | None = None,
71+
prompt: Any | None = None,
72+
) -> AsyncIterator[TResponseStreamEvent]:
73+
raise NotImplementedError("streaming not used in this example")
74+
75+
76+
def on_tool_not_found(data: ToolNotFoundErrorHandlerInput[Any]) -> ToolNotFoundAction:
77+
"""Build a model-visible error so the model can pick a valid tool on its next step."""
78+
available = ", ".join(data.available_tools) or "(none)"
79+
return ToolNotFoundAction(
80+
error_message=(
81+
f"Tool {data.tool_name!r} is not registered on this agent. "
82+
f"Available tools: [{available}]. Pick one of those and try again."
83+
)
84+
)
85+
86+
87+
async def main() -> None:
88+
# Turn 1: the model hallucinates a tool that doesn't exist.
89+
# Turn 2: after the handler injects the error, the model recovers with a final answer.
90+
scripted_model = ScriptedModel(
91+
[
92+
[
93+
ResponseFunctionToolCall(
94+
id="call-1",
95+
call_id="call-1",
96+
type="function_call",
97+
name="search_linkedin", # intentionally unknown
98+
arguments='{"query": "Anthropic"}',
99+
)
100+
],
101+
[
102+
ResponseOutputMessage.model_validate(
103+
{
104+
"id": "msg-1",
105+
"type": "message",
106+
"role": "assistant",
107+
"status": "completed",
108+
"content": [
109+
{
110+
"type": "output_text",
111+
"text": "Sorry, I used the wrong tool. Here's what I got from search_web instead.",
112+
"annotations": [],
113+
"logprobs": [],
114+
}
115+
],
116+
}
117+
)
118+
],
119+
]
120+
)
121+
122+
agent = Agent(
123+
name="recoverable_agent",
124+
instructions="You are a helpful assistant.",
125+
model=scripted_model,
126+
tools=[search_web],
127+
)
128+
129+
result = await Runner.run(
130+
agent,
131+
input="find me profiles related to Anthropic",
132+
error_handlers={"tool_not_found": on_tool_not_found},
133+
)
134+
135+
print("Final output:")
136+
print(result.final_output)
137+
138+
139+
if __name__ == "__main__":
140+
asyncio.run(main())

src/agents/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,9 @@
112112
RunErrorHandlerInput,
113113
RunErrorHandlerResult,
114114
RunErrorHandlers,
115+
ToolNotFoundAction,
116+
ToolNotFoundErrorHandler,
117+
ToolNotFoundErrorHandlerInput,
115118
)
116119
from .run_state import RunState
117120
from .stream_events import (
@@ -420,6 +423,9 @@ def enable_verbose_stdout_logging():
420423
"RunErrorHandlerInput",
421424
"RunErrorHandlerResult",
422425
"RunErrorHandlers",
426+
"ToolNotFoundAction",
427+
"ToolNotFoundErrorHandler",
428+
"ToolNotFoundErrorHandlerInput",
423429
"AgentToolInvocation",
424430
"RunResult",
425431
"RunResultStreaming",

src/agents/run.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,6 +1194,8 @@ def _finalize_result(result: RunResult) -> RunResult:
11941194
),
11951195
reasoning_item_id_policy=resolved_reasoning_item_id_policy,
11961196
prompt_cache_key_resolver=prompt_cache_key_resolver,
1197+
error_handlers=error_handlers,
1198+
model_responses_so_far=model_responses,
11971199
)
11981200
)
11991201

@@ -1249,6 +1251,8 @@ def _finalize_result(result: RunResult) -> RunResult:
12491251
),
12501252
reasoning_item_id_policy=resolved_reasoning_item_id_policy,
12511253
prompt_cache_key_resolver=prompt_cache_key_resolver,
1254+
error_handlers=error_handlers,
1255+
model_responses_so_far=model_responses,
12521256
)
12531257
finally:
12541258
attach_usage_to_span(

src/agents/run_config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ class RunOptions(TypedDict, Generic[TContext]):
284284
"""The session for the run."""
285285

286286
error_handlers: NotRequired[RunErrorHandlers[TContext] | None]
287-
"""Error handlers keyed by error kind. Currently supports max_turns."""
287+
"""Error handlers keyed by error kind. Supports ``max_turns`` and ``tool_not_found``."""
288288

289289

290290
__all__ = [

src/agents/run_error_handlers.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,58 @@ class RunErrorHandlerResult:
4747
]
4848

4949

50+
@dataclass
51+
class ToolNotFoundErrorHandlerInput(Generic[TContext]):
52+
"""Input passed to the ``tool_not_found`` error handler.
53+
54+
The handler is invoked when the model calls a tool that is not registered on the current
55+
agent. Returning :class:`ToolNotFoundAction` tells the runner to inject a synthetic tool
56+
output with ``error_message`` so the model can self-correct on the next turn. Returning
57+
``None`` re-raises the original :class:`ModelBehaviorError`.
58+
"""
59+
60+
tool_name: str
61+
"""Name of the tool the model tried to call."""
62+
63+
available_tools: list[str]
64+
"""Names of tools actually registered on the agent (function + custom + handoffs)."""
65+
66+
agent: Agent[Any]
67+
"""The agent that received the bogus tool call."""
68+
69+
context: RunContextWrapper[TContext]
70+
"""The run context wrapper."""
71+
72+
run_data: RunErrorData
73+
"""Snapshot of run data at the moment the error occurred."""
74+
75+
76+
@dataclass
77+
class ToolNotFoundAction:
78+
"""Instructs the runner to recover from a tool-not-found error.
79+
80+
The runner appends a synthetic ``function_call_output`` item containing ``error_message`` to
81+
the conversation, then continues the turn. The model will see the error on its next step and
82+
can retry with a valid tool name.
83+
84+
Note: recovery is bounded by the run's ``max_turns`` setting. A model that repeatedly
85+
hallucinates tool calls will eventually hit that limit and raise ``MaxTurnsExceeded``.
86+
"""
87+
88+
error_message: str
89+
90+
91+
ToolNotFoundErrorHandler = Callable[
92+
[ToolNotFoundErrorHandlerInput[TContext]],
93+
MaybeAwaitable["ToolNotFoundAction | None"],
94+
]
95+
96+
5097
class RunErrorHandlers(TypedDict, Generic[TContext], total=False):
5198
"""Error handlers keyed by error kind."""
5299

53100
max_turns: RunErrorHandler[TContext]
101+
tool_not_found: ToolNotFoundErrorHandler[TContext]
54102

55103

56104
__all__ = [
@@ -59,4 +107,7 @@ class RunErrorHandlers(TypedDict, Generic[TContext], total=False):
59107
"RunErrorHandlerInput",
60108
"RunErrorHandlerResult",
61109
"RunErrorHandlers",
110+
"ToolNotFoundAction",
111+
"ToolNotFoundErrorHandler",
112+
"ToolNotFoundErrorHandlerInput",
62113
]

src/agents/run_internal/error_handlers.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
RunErrorHandlerInput,
2424
RunErrorHandlerResult,
2525
RunErrorHandlers,
26+
ToolNotFoundAction,
27+
ToolNotFoundErrorHandlerInput,
2628
)
2729
from .items import ReasoningItemIdPolicy, run_item_to_input_item
2830
from .turn_preparation import get_output_schema
@@ -161,3 +163,42 @@ async def resolve_run_error_handler_result(
161163
raise UserError("Invalid run error handler result.") from exc
162164
return RunErrorHandlerResult(final_output=result)
163165
return RunErrorHandlerResult(final_output=result)
166+
167+
168+
async def resolve_tool_not_found_action(
169+
*,
170+
error_handlers: RunErrorHandlers[TContext] | None,
171+
tool_name: str,
172+
available_tools: list[str],
173+
agent: Agent[Any],
174+
context_wrapper: RunContextWrapper[TContext],
175+
run_data: RunErrorData,
176+
) -> ToolNotFoundAction | None:
177+
"""Invoke the ``tool_not_found`` handler (if configured) and normalize its return value.
178+
179+
Returns a :class:`ToolNotFoundAction` when the handler asks the runner to recover, or
180+
``None`` when no handler is registered or the handler opts to re-raise.
181+
"""
182+
if not error_handlers:
183+
return None
184+
handler = error_handlers.get("tool_not_found")
185+
if handler is None:
186+
return None
187+
handler_input = ToolNotFoundErrorHandlerInput(
188+
tool_name=tool_name,
189+
available_tools=available_tools,
190+
agent=agent,
191+
context=context_wrapper,
192+
run_data=run_data,
193+
)
194+
result: Any = handler(handler_input)
195+
if inspect.isawaitable(result):
196+
result = await result
197+
if result is None:
198+
return None
199+
if isinstance(result, ToolNotFoundAction):
200+
return result
201+
raise UserError(
202+
"tool_not_found handler must return ToolNotFoundAction or None, "
203+
f"got {type(result).__name__}."
204+
)

src/agents/run_internal/run_loop.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1028,6 +1028,7 @@ async def _save_stream_items_without_count(
10281028
),
10291029
reasoning_item_id_policy=resolved_reasoning_item_id_policy,
10301030
prompt_cache_key_resolver=prompt_cache_key_resolver,
1031+
error_handlers=error_handlers,
10311032
)
10321033
finally:
10331034
attach_usage_to_span(
@@ -1246,6 +1247,7 @@ async def run_single_turn_streamed(
12461247
pending_server_items: list[RunItem] | None = None,
12471248
reasoning_item_id_policy: ReasoningItemIdPolicy | None = None,
12481249
prompt_cache_key_resolver: PromptCacheKeyResolver | None = None,
1250+
error_handlers: RunErrorHandlers[TContext] | None = None,
12491251
) -> SingleStepResult:
12501252
"""Run a single streamed turn and emit events as results arrive."""
12511253
public_agent = bindings.public_agent
@@ -1636,6 +1638,8 @@ async def rewind_model_request() -> None:
16361638
server_manages_conversation=server_conversation_tracker is not None,
16371639
event_queue=streamed_result._event_queue,
16381640
before_side_effects=raise_if_input_guardrail_tripwire_known,
1641+
error_handlers=error_handlers,
1642+
raw_responses_so_far=streamed_result.raw_responses,
16391643
)
16401644

16411645
items_to_filter = session_items_for_turn(single_step_result)
@@ -1697,6 +1701,8 @@ async def run_single_turn(
16971701
session_items_to_rewind: list[TResponseInputItem] | None = None,
16981702
reasoning_item_id_policy: ReasoningItemIdPolicy | None = None,
16991703
prompt_cache_key_resolver: PromptCacheKeyResolver | None = None,
1704+
error_handlers: RunErrorHandlers[TContext] | None = None,
1705+
model_responses_so_far: list[ModelResponse] | None = None,
17001706
) -> SingleStepResult:
17011707
"""Run a single non-streaming turn of the agent loop."""
17021708
public_agent = bindings.public_agent
@@ -1766,6 +1772,8 @@ async def run_single_turn(
17661772
run_config=run_config,
17671773
tool_use_tracker=tool_use_tracker,
17681774
server_manages_conversation=server_conversation_tracker is not None,
1775+
error_handlers=error_handlers,
1776+
raw_responses_so_far=model_responses_so_far,
17691777
)
17701778

17711779

0 commit comments

Comments
 (0)