Fix Document truncation in FunctionTool._parse_tool_output #19585

meirk-brd · 2025-08-03T08:44:06Z

Problem

Document objects returned by tools are being truncated to ~400 characters instead of providing full content to agents, causing infinite loops and degraded performance.

Root Cause

FunctionTool._parse_tool_output() calls str(raw_output) which triggers Document.__str__() that truncates content to 350 characters for display purposes. This truncated content is then used for agent processing instead of display only.

Solution

Replace str(raw_output) with raw_output.get_content() for Document objects
Preserve str() fallback for non-Document objects
Maintains full backward compatibility

Testing

Before fix:

Document with 960 characters → Tool output: 401 characters (truncated)

After fix:

Document with 960 characters → Tool output: 960 characters (full content)

Impact

Fixes infinite agent loops with document-based tools (BrightData, web scrapers, etc.)
Improves agent performance and reliability
No breaking changes

Here is a simple test to prove that this indeed happens :

from llama_index.core.schema import Document
from llama_index.core.tools.function_tool import FunctionTool
from llama_index.core.tools.types import ToolMetadata

def test_document_truncation_bug():
    
    long_content = "This is a test document with very long content. " * 20 
    doc = Document(text=long_content)
    
    print(f"Original document content length: {len(doc.get_content())} characters")
    print(f"Document str() length: {len(str(doc))} characters")
    print()
    
    def mock_tool_function() -> Document:
        return doc
    
    tool = FunctionTool(
        fn=mock_tool_function,
        metadata=ToolMetadata(name="test_tool", description="Test tool")
    )
    
    result = tool.call()
    
    print("=== RESULTS ===")
    print(f"Raw output (Document) content length: {len(result.raw_output.get_content())} characters")
    print(f"Tool output content length: {len(result.content)} characters")
    print(f"Tool output blocks[0] text length: {len(result.blocks[0].text)} characters")
    print()
    print("First 100 chars of each:")
    print(f"Raw output: {result.raw_output.get_content()[:100]}...")
    print(f"Tool content: {result.content[:100]}...")
    print(f"Blocks text: {result.blocks[0].text[:100]}...")
    
    assert len(result.raw_output.get_content()) > 350, "Raw output should have full content"
    assert len(result.blocks[0].text) < len(result.raw_output.get_content()), "BUG: Blocks are truncated!"
    
    print("\n BUG CONFIRMED: Tool blocks contain truncated content while raw_output has full content")

if __name__ == "__main__":
    test_document_truncation_bug()

Here are the results that clearly shows the issue:

Original document content length: 960 characters
Document str() length: 401 characters

=== RESULTS ===
Raw output (Document) content length: 960 characters
Tool output content length: 401 characters
Tool output blocks[0] text length: 401 characters

First 100 chars of each:
Raw output: This is a test document with very long content. This is a test document with very long content. This...
Tool content: Doc ID: 2f470677-7a7a-4886-9453-34d198b837f1
Text: This is a test document with very long content. T...
Blocks text: Doc ID: 2f470677-7a7a-4886-9453-34d198b837f1
Text: This is a test document with very long content. T...

 BUG CONFIRMED: Tool blocks contain truncated content while raw_output has full content

Here is a PR where I thought that the issue was with my test : #19301

but after debugging further I found this :

And started debugging to see weather it is applied, and even when calling the document and found that the small change in my PR solves this issue.

@logan-markewich - Can you please take a look ? I might be wrong but it looks like something is off.

- Replace str(raw_output) with raw_output.get_content() for Document objects - Prevents tool output truncation from 960+ chars to ~400 chars - Fixes infinite agent loops caused by incomplete tool data - Maintains backward compatibility for non-Document objects Fixes issue where Document.__str__() truncation (intended for display) was affecting tool data processing, causing agents to receive incomplete information and repeatedly call tools. Tested with Document objects containing 960+ characters: - Before: Tool output truncated to 401 characters - After: Tool output contains full 960+ characters

Tonel · 2025-08-03T08:56:59Z

Personally, I noticed that behavior too. The problem seems to be due to the truncation logic in the __str__ method of the Document class:

def __str__(self) -> str:
    source_text_truncated = truncate_text(
        self.get_content().strip(), TRUNCATE_LENGTH
    )
    # ...

Specifically, if the result of the tool is a Document instance, it gets truncated when the tool output is parsed in llama_index/core/tools/function_tool.py:

def _parse_tool_output(self, raw_output: Any) -> List[ContentBlock]:
    """Parse tool output into content blocks."""
    if isinstance(
        raw_output, (TextBlock, ImageBlock, AudioBlock, CitableBlock, CitationBlock)
    ):
        return [raw_output]
    elif isinstance(raw_output, list) and all(
        isinstance(
            item, (TextBlock, ImageBlock, AudioBlock, CitableBlock, CitationBlock)
        )
        for item in raw_output
    ):
        return raw_output
    else:
        return [TextBlock(text=str(raw_output))]  # <-- HERE

Hope this helps!

logan-markewich · 2025-08-04T18:38:36Z

llama-index-core/llama_index/core/tools/function_tool.py

            return raw_output
        else:
-            return [TextBlock(text=str(raw_output))]
+            return [TextBlock(text=raw_output.get_content())]


This will break any time the output is not a document/node object. We need an additional if/else check rather than doing this in the final else block (since here, the type is still "any")

Pushed a more type-safe fix. Going to add a test or two as well

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Aug 3, 2025

logan-markewich reviewed Aug 4, 2025

View reviewed changes

logan-markewich added 2 commits August 4, 2025 12:42

type-safe fix

a5f9aff

add tests

ccc96ca

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Aug 4, 2025

logan-markewich merged commit 1e02c7a into run-llama:main Aug 5, 2025
8 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Document truncation in FunctionTool._parse_tool_output #19585

Fix Document truncation in FunctionTool._parse_tool_output #19585

Uh oh!

meirk-brd commented Aug 3, 2025

Uh oh!

Tonel commented Aug 3, 2025

Uh oh!

logan-markewich Aug 4, 2025

Uh oh!

logan-markewich Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Document truncation in FunctionTool._parse_tool_output #19585

Fix Document truncation in FunctionTool._parse_tool_output #19585

Uh oh!

Conversation

meirk-brd commented Aug 3, 2025

Problem

Root Cause

Solution

Testing

Impact

Uh oh!

Tonel commented Aug 3, 2025

Uh oh!

logan-markewich Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

logan-markewich Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants