Enforce usage limits by exact tool call count (not just request/turn count)

### Initial Checks

- [x] I confirm that I'm using the latest version of Pydantic AI
- [x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

### Description

- Today, `UsageLimitExceeded` is enforced for:
    - Model request count (`request_limit`)
    - Token limits (`input_tokens_limit`, `output_tokens_limit`, `total_tokens_limit`)
  - There is no enforcement based on the number of tool calls executed. As a result, a single model turn that triggers many tool calls is not bounded by usage limits until the next model request is made. This is coarse when the goal is to cap the exact number of tool invocations.

- **Current behavior**
  - Limits only check requests/tokens: https://github.com/pydantic/pydantic-ai/blob/09d00e44d6e8563c472556ea6a8dc3487033e102/pydantic_ai_slim/pydantic_ai/usage.py#L281-L297

  - Requests are incremented per model request: https://github.com/pydantic/pydantic-ai/blob/09d00e44d6e8563c472556ea6a8dc3487033e102/pydantic_ai_slim/pydantic_ai/_agent_graph.py#L306-L311

    https://github.com/pydantic/pydantic-ai/blob/09d00e44d6e8563c472556ea6a8dc3487033e102/pydantic_ai_slim/pydantic_ai/_agent_graph.py#L338-L340

  - Tool calls are executed but not counted anywhere: https://github.com/pydantic/pydantic-ai/blob/09d00e44d6e8563c472556ea6a8dc3487033e102/pydantic_ai_slim/pydantic_ai/_agent_graph.py#L585-L603

- **Expected behavior**
  - Provide a way to cap the exact number of tool calls in a run. Exceeding this cap should raise `UsageLimitExceeded` before invoking the tool that would exceed the limit.
  - Note: It’s unclear whether current behavior matches intended semantics; I’m unsure if usage limits should be enforced across tool calls (vs. requests). This issue proposes making tool-call caps explicit.

- **Proposed solution**
  - Add a tool-call counter to `RunUsage` and a corresponding limit in `UsageLimits`:
    - `RunUsage.tool_calls: int = 0`
    - `UsageLimits.tool_calls_limit: int | None = None`
  - Increment `usage.tool_calls` for every actual tool invocation (i.e., every call to `tool_manager.handle_call`), including parallel invocations and retries if they imply another invocation.
    - Candidate places:
      - In `process_function_tools` just before each `handle_call(...)`.
      - Or centrally inside `ToolManager.handle_call(...)` to ensure consistent counting for all tool kinds.
  - Enforce limit in a new check before each tool invocation:
    - Raise `UsageLimitExceeded(f'The next tool call would exceed the tool_calls_limit of {limit} (tool_calls={usage.tool_calls})')`.
  - Keep default `None` to disable the limit to preserve backwards compatibility.
  - Document that this counts actual tool executions, not tool-call messages that are skipped or replaced by “final result already processed” stubs.

- **Why this is needed**
  - `request_limit` caps model turns, not tool calls. A single turn can perform many tool calls, so `request_limit` is not a precise guardrail for tool execution costs or runaway tool loops within a single turn.

- **Notes/edge cases**
  - Clarify whether to count:
    - Output tool calls that do not result in `handle_call` when a final result is already present.
    - Deferred tool calls (likely not counted unless executed).
    - Retries: each execution attempt should count as a tool call.
  - Consider telemetry alignment by adding `gen_ai.usage.details.tool_calls` to OpenTelemetry attributes.

- **Docs**
  - Current docs recommend `request_limit` to prevent “infinite tool calling,” which only limits turns.

### Example Code

```Python

```

### Python, Pydantic AI & LLM client version

```Text
Python 3.13
Pydantic AI 0.7.2
```

	def check_before_request(self, usage: RunUsage) -> None:
	"""Raises a `UsageLimitExceeded` exception if the next request would exceed any of the limits."""
	request_limit = self.request_limit
	if request_limit is not None and usage.requests >= request_limit:
	raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')

	input_tokens = usage.input_tokens
	if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
	raise UsageLimitExceeded(
	f'The next request would exceed the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})'
	)

	total_tokens = usage.total_tokens
	if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
	raise UsageLimitExceeded(
	f'The next request would exceed the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})'
	)

	model_settings, model_request_parameters, message_history, run_context = await self._prepare_request(ctx)
	async with ctx.deps.model.request_stream(
	message_history, model_settings, model_request_parameters, run_context
	) as streamed_response:
	self._did_stream = True
	ctx.state.usage.requests += 1

	async def process_function_tools( # noqa: C901
	tool_manager: ToolManager[DepsT],
	tool_calls: list[_messages.ToolCallPart],
	final_result: result.FinalResult[NodeRunEndT] \| None,
	ctx: GraphRunContext[GraphAgentState, GraphAgentDeps[DepsT, NodeRunEndT]],
	output_parts: list[_messages.ModelRequestPart],
	output_final_result: deque[result.FinalResult[NodeRunEndT]] = deque(maxlen=1),
	) -> AsyncIterator[_messages.HandleResponseEvent]:
	"""Process function (i.e., non-result) tool calls in parallel.

	Also add stub return parts for any other tools that need it.

	Because async iterators can't have return values, we use `output_parts` and `output_final_result` as output arguments.
	"""
	tool_calls_by_kind: dict[ToolKind \| Literal['unknown'], list[_messages.ToolCallPart]] = defaultdict(list)
	for call in tool_calls:
	tool_def = tool_manager.get_tool_def(call.tool_name)
	kind = tool_def.kind if tool_def else 'unknown'
	tool_calls_by_kind[kind].append(call)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enforce usage limits by exact tool call count (not just request/turn count) #2593

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	model_settings, model_request_parameters, message_history, _ = await self._prepare_request(ctx)
	model_response = await ctx.deps.model.request(message_history, model_settings, model_request_parameters)
	ctx.state.usage.requests += 1

Enforce usage limits by exact tool call count (not just request/turn count) #2593

Description

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions