-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Closed
Copy link
Labels
Feature requestNew feature requestNew feature request
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
-
Today,
UsageLimitExceeded
is enforced for:- Model request count (
request_limit
) - Token limits (
input_tokens_limit
,output_tokens_limit
,total_tokens_limit
) - There is no enforcement based on the number of tool calls executed. As a result, a single model turn that triggers many tool calls is not bounded by usage limits until the next model request is made. This is coarse when the goal is to cap the exact number of tool invocations.
- Model request count (
-
Current behavior
-
Limits only check requests/tokens:
pydantic-ai/pydantic_ai_slim/pydantic_ai/usage.py
Lines 281 to 297 in 09d00e4
def check_before_request(self, usage: RunUsage) -> None: """Raises a `UsageLimitExceeded` exception if the next request would exceed any of the limits.""" request_limit = self.request_limit if request_limit is not None and usage.requests >= request_limit: raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}') input_tokens = usage.input_tokens if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit: raise UsageLimitExceeded( f'The next request would exceed the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})' ) total_tokens = usage.total_tokens if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit: raise UsageLimitExceeded( f'The next request would exceed the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})' ) -
Requests are incremented per model request:
pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py
Lines 306 to 311 in 09d00e4
model_settings, model_request_parameters, message_history, run_context = await self._prepare_request(ctx) async with ctx.deps.model.request_stream( message_history, model_settings, model_request_parameters, run_context ) as streamed_response: self._did_stream = True ctx.state.usage.requests += 1 pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py
Lines 338 to 340 in 09d00e4
model_settings, model_request_parameters, message_history, _ = await self._prepare_request(ctx) model_response = await ctx.deps.model.request(message_history, model_settings, model_request_parameters) ctx.state.usage.requests += 1 -
Tool calls are executed but not counted anywhere:
pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py
Lines 585 to 603 in 09d00e4
async def process_function_tools( # noqa: C901 tool_manager: ToolManager[DepsT], tool_calls: list[_messages.ToolCallPart], final_result: result.FinalResult[NodeRunEndT] | None, ctx: GraphRunContext[GraphAgentState, GraphAgentDeps[DepsT, NodeRunEndT]], output_parts: list[_messages.ModelRequestPart], output_final_result: deque[result.FinalResult[NodeRunEndT]] = deque(maxlen=1), ) -> AsyncIterator[_messages.HandleResponseEvent]: """Process function (i.e., non-result) tool calls in parallel. Also add stub return parts for any other tools that need it. Because async iterators can't have return values, we use `output_parts` and `output_final_result` as output arguments. """ tool_calls_by_kind: dict[ToolKind | Literal['unknown'], list[_messages.ToolCallPart]] = defaultdict(list) for call in tool_calls: tool_def = tool_manager.get_tool_def(call.tool_name) kind = tool_def.kind if tool_def else 'unknown' tool_calls_by_kind[kind].append(call)
-
-
Expected behavior
- Provide a way to cap the exact number of tool calls in a run. Exceeding this cap should raise
UsageLimitExceeded
before invoking the tool that would exceed the limit. - Note: It’s unclear whether current behavior matches intended semantics; I’m unsure if usage limits should be enforced across tool calls (vs. requests). This issue proposes making tool-call caps explicit.
- Provide a way to cap the exact number of tool calls in a run. Exceeding this cap should raise
-
Proposed solution
- Add a tool-call counter to
RunUsage
and a corresponding limit inUsageLimits
:RunUsage.tool_calls: int = 0
UsageLimits.tool_calls_limit: int | None = None
- Increment
usage.tool_calls
for every actual tool invocation (i.e., every call totool_manager.handle_call
), including parallel invocations and retries if they imply another invocation.- Candidate places:
- In
process_function_tools
just before eachhandle_call(...)
. - Or centrally inside
ToolManager.handle_call(...)
to ensure consistent counting for all tool kinds.
- In
- Candidate places:
- Enforce limit in a new check before each tool invocation:
- Raise
UsageLimitExceeded(f'The next tool call would exceed the tool_calls_limit of {limit} (tool_calls={usage.tool_calls})')
.
- Raise
- Keep default
None
to disable the limit to preserve backwards compatibility. - Document that this counts actual tool executions, not tool-call messages that are skipped or replaced by “final result already processed” stubs.
- Add a tool-call counter to
-
Why this is needed
request_limit
caps model turns, not tool calls. A single turn can perform many tool calls, sorequest_limit
is not a precise guardrail for tool execution costs or runaway tool loops within a single turn.
-
Notes/edge cases
- Clarify whether to count:
- Output tool calls that do not result in
handle_call
when a final result is already present. - Deferred tool calls (likely not counted unless executed).
- Retries: each execution attempt should count as a tool call.
- Output tool calls that do not result in
- Consider telemetry alignment by adding
gen_ai.usage.details.tool_calls
to OpenTelemetry attributes.
- Clarify whether to count:
-
Docs
- Current docs recommend
request_limit
to prevent “infinite tool calling,” which only limits turns.
- Current docs recommend
Example Code
Python, Pydantic AI & LLM client version
Python 3.13
Pydantic AI 0.7.2
calyptis
Metadata
Metadata
Assignees
Labels
Feature requestNew feature requestNew feature request