Skip to content

Enforce usage limits by exact tool call count (not just request/turn count) #2593

@tradeqvest

Description

@tradeqvest

Initial Checks

Description

  • Today, UsageLimitExceeded is enforced for:

    • Model request count (request_limit)
    • Token limits (input_tokens_limit, output_tokens_limit, total_tokens_limit)
    • There is no enforcement based on the number of tool calls executed. As a result, a single model turn that triggers many tool calls is not bounded by usage limits until the next model request is made. This is coarse when the goal is to cap the exact number of tool invocations.
  • Current behavior

    • Limits only check requests/tokens:

      def check_before_request(self, usage: RunUsage) -> None:
      """Raises a `UsageLimitExceeded` exception if the next request would exceed any of the limits."""
      request_limit = self.request_limit
      if request_limit is not None and usage.requests >= request_limit:
      raise UsageLimitExceeded(f'The next request would exceed the request_limit of {request_limit}')
      input_tokens = usage.input_tokens
      if self.input_tokens_limit is not None and input_tokens > self.input_tokens_limit:
      raise UsageLimitExceeded(
      f'The next request would exceed the input_tokens_limit of {self.input_tokens_limit} ({input_tokens=})'
      )
      total_tokens = usage.total_tokens
      if self.total_tokens_limit is not None and total_tokens > self.total_tokens_limit:
      raise UsageLimitExceeded(
      f'The next request would exceed the total_tokens_limit of {self.total_tokens_limit} ({total_tokens=})'
      )

    • Requests are incremented per model request:

      model_settings, model_request_parameters, message_history, run_context = await self._prepare_request(ctx)
      async with ctx.deps.model.request_stream(
      message_history, model_settings, model_request_parameters, run_context
      ) as streamed_response:
      self._did_stream = True
      ctx.state.usage.requests += 1

      model_settings, model_request_parameters, message_history, _ = await self._prepare_request(ctx)
      model_response = await ctx.deps.model.request(message_history, model_settings, model_request_parameters)
      ctx.state.usage.requests += 1

    • Tool calls are executed but not counted anywhere:

      async def process_function_tools( # noqa: C901
      tool_manager: ToolManager[DepsT],
      tool_calls: list[_messages.ToolCallPart],
      final_result: result.FinalResult[NodeRunEndT] | None,
      ctx: GraphRunContext[GraphAgentState, GraphAgentDeps[DepsT, NodeRunEndT]],
      output_parts: list[_messages.ModelRequestPart],
      output_final_result: deque[result.FinalResult[NodeRunEndT]] = deque(maxlen=1),
      ) -> AsyncIterator[_messages.HandleResponseEvent]:
      """Process function (i.e., non-result) tool calls in parallel.
      Also add stub return parts for any other tools that need it.
      Because async iterators can't have return values, we use `output_parts` and `output_final_result` as output arguments.
      """
      tool_calls_by_kind: dict[ToolKind | Literal['unknown'], list[_messages.ToolCallPart]] = defaultdict(list)
      for call in tool_calls:
      tool_def = tool_manager.get_tool_def(call.tool_name)
      kind = tool_def.kind if tool_def else 'unknown'
      tool_calls_by_kind[kind].append(call)

  • Expected behavior

    • Provide a way to cap the exact number of tool calls in a run. Exceeding this cap should raise UsageLimitExceeded before invoking the tool that would exceed the limit.
    • Note: It’s unclear whether current behavior matches intended semantics; I’m unsure if usage limits should be enforced across tool calls (vs. requests). This issue proposes making tool-call caps explicit.
  • Proposed solution

    • Add a tool-call counter to RunUsage and a corresponding limit in UsageLimits:
      • RunUsage.tool_calls: int = 0
      • UsageLimits.tool_calls_limit: int | None = None
    • Increment usage.tool_calls for every actual tool invocation (i.e., every call to tool_manager.handle_call), including parallel invocations and retries if they imply another invocation.
      • Candidate places:
        • In process_function_tools just before each handle_call(...).
        • Or centrally inside ToolManager.handle_call(...) to ensure consistent counting for all tool kinds.
    • Enforce limit in a new check before each tool invocation:
      • Raise UsageLimitExceeded(f'The next tool call would exceed the tool_calls_limit of {limit} (tool_calls={usage.tool_calls})').
    • Keep default None to disable the limit to preserve backwards compatibility.
    • Document that this counts actual tool executions, not tool-call messages that are skipped or replaced by “final result already processed” stubs.
  • Why this is needed

    • request_limit caps model turns, not tool calls. A single turn can perform many tool calls, so request_limit is not a precise guardrail for tool execution costs or runaway tool loops within a single turn.
  • Notes/edge cases

    • Clarify whether to count:
      • Output tool calls that do not result in handle_call when a final result is already present.
      • Deferred tool calls (likely not counted unless executed).
      • Retries: each execution attempt should count as a tool call.
    • Consider telemetry alignment by adding gen_ai.usage.details.tool_calls to OpenTelemetry attributes.
  • Docs

    • Current docs recommend request_limit to prevent “infinite tool calling,” which only limits turns.

Example Code

Python, Pydantic AI & LLM client version

Python 3.13
Pydantic AI 0.7.2

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions