-
Couldn't load subscription status.
- Fork 6.5k
Description
Feature Description
Ollama models now expose a thinking field in their streaming responses, providing insight into the model's intermediate reasoning steps. This was introduced in the Ollama class here.
I am requesting that the FunctionAgent (specifically in function_agent.py) and the AgentStream class be updated to include the thinking field from the LLM response in the streamed events. This would allow downstream application and UI to display the model's "thinking" process in real time.
ctx.write_event_to_stream(
AgentStream(
delta=last_chat_response.delta or "",
response=last_chat_response.message.content or "",
tool_calls=tool_calls or [],
raw=raw,
current_agent_name=self.name,
thinking_delta=last_chat_response.additional_kwargs.get("thinking_delta", ""),
)
)Reason
Currently, the AgentStream object does not include or expose the thinking field, even though it is available in the Ollama LLM response (last_chat_response.additional_kwargs["thinking_delta"]). As a result, there is no way for downstream application to display this information. Since LlamaIndex is a framework supporting multiple LLM providers, and not all providers expose a thinking field, this field is not currently part of the standard agent streaming interface.
Value of Feature
Exposing the thinking field in AgentStream would enable developers to build interactive UIs that can show the model's intermediate reasoning ("thinking") as it streams, improving transparency and user experience.