How to count tokens when aborting stream? #33017

Magnuti · 2025-09-19T07:15:10Z

Magnuti
Sep 19, 2025

In our app we have a stop button that triggers a an AbortSignal that stops the LLM stream. Usually, we get token usage from usage_metadata but when we abort the request we don't get usage_metadata.

What happens backend? We use Azure OpenAI btw. Is the token usage on Azure counted as the full response or just up until cancellation?

How can we count tokens reliably without usage_metadata. We could estimate the token count, but we would ideally get the exact count.

We use Node.js.

Answered by kiril-buga

Sep 24, 2025

Ensure it using the same tokenizers

Use OpenAI’s tokenizers that correspond to the deployed model:
- GPT-3.5/4 “turbo” era → cl100k_base
- GPT-4o family → the newer “o*” encodings (e.g., o200k_base).
  OpenAI documents the mapping and provides an official tokenizer and cookbook examples. (https://platform.openai.com/tokenizer)
Count exactly what you send (the serialized chat messages, system+user+assistant+tool calls). Function/tool call arguments and schemas are tokens too.
For streaming, Azure bills prompt + generated tokens up to the moment you stop. (Billing/metrics define this explicitly.)

How to be certain?

On the client: use a tokenizer that matches your model and count the exact s…

View full answer

kiril-buga · 2025-09-20T13:29:08Z

kiril-buga
Sep 20, 2025

When you hit Stop in a streaming Azure OpenAI call, you’re only billed for the prompt tokens + the completion tokens that were actually streamed before cancel. You’re not billed for the full hypothetical answer.

The reason usage_metadata is missing is because Azure only sends usage info in the last streaming chunk. If you abort before that, LangChain never sees it.

👉 How to fix:

Count tokens yourself:
- Tokenize your input messages (prompt tokens).
- Concatenate all streamed text you’ve received so far and tokenize it (completion tokens).

Libraries: gpt-tokenizer or @dqbd/tiktoken.

import { encode, encodeChat } from "gpt-tokenizer";

const promptTokens = encodeChat(messages, "gpt-4o");
let streamedText = "";
for await (const chunk of stream) {
  const delta = chunk?.choices?.[0]?.delta?.content ?? "";
  streamedText += delta;
}
const completionTokens = encode(streamedText).length;
const totalTokens = promptTokens.length + completionTokens;

Optional: if you let the stream finish, add
"stream_options": { "include_usage": true }
so Azure will send usage at the end. But if you abort early, you still need to self-count.

Let me know if this helps.

2 replies

Magnuti Sep 23, 2025
Author

Well it kinda helps yeah. But is the token calculation reliable? Can I be sure that my calculation of the encoding is the same that Azure uses?

kiril-buga Sep 24, 2025

Ensure it using the same tokenizers

Use OpenAI’s tokenizers that correspond to the deployed model:
- GPT-3.5/4 “turbo” era → cl100k_base
- GPT-4o family → the newer “o*” encodings (e.g., o200k_base).
  OpenAI documents the mapping and provides an official tokenizer and cookbook examples. (https://platform.openai.com/tokenizer)
Count exactly what you send (the serialized chat messages, system+user+assistant+tool calls). Function/tool call arguments and schemas are tokens too.
For streaming, Azure bills prompt + generated tokens up to the moment you stop. (Billing/metrics define this explicitly.)

How to be certain?

On the client: use a tokenizer that matches your model and count the exact serialized payload and streamed deltas. (OpenAI’s tiktoken/tokenizer UI + cookbook.)
On Azure: reconcile with Azure Monitor’s token metrics (Processed tokens, Prompt, Completion), or emit per-request token metrics via API Management’s azure-openai-emit-token-metric policy—even for streaming. (useful link: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/monitor-openai-reference)
If you need usage in-stream: OpenAI supports a final “usage” chunk; Azure feature parity has been discussed publicly—check if your region/deployment exposes it; otherwise rely on APIM/Monitor for ground truth.

Please, mark it as the solution if it helped.

Answer selected by Magnuti

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to count tokens when aborting stream? #33017

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to count tokens when aborting stream? #33017

Uh oh!

Magnuti Sep 19, 2025

Replies: 1 comment · 2 replies

Uh oh!

kiril-buga Sep 20, 2025

Uh oh!

Magnuti Sep 23, 2025 Author

Uh oh!

Uh oh!

kiril-buga Sep 24, 2025

Magnuti
Sep 19, 2025

Replies: 1 comment 2 replies

kiril-buga
Sep 20, 2025

Magnuti Sep 23, 2025
Author