Feat: add system prompt / tool caching to BedrockConverse #19737

jeromevdl · 2025-08-25T16:56:07Z

Description

Add system prompt caching and tool caching for BedrockConverse to reduce the number of tokens used.

Fixes #19733

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
[] I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py

mattref · 2025-09-05T03:49:35Z

Coming back to @logan-markewich's comment:

If we want to auto-insert cachepoints, we should at most insert one (otherwise we risk placing too many and creating errors, the limit is 4)

how is this being addressed with the changes in this PR? If I enable the automatic insertion of cachepoints and go beyond the limit (e.g., 4 for Claude), I'll get errors, without any way to control the amount of cache points actually inserted.

jeromevdl · 2025-09-05T08:23:10Z

Not sure to understand, you said the PR does the opposite of what you want and approve it, I'm happy to update it to make it more compliant, if I understand what is required and this is actually possible with how Bedrock works...

Regarding your question @mattref, this is not addressed I think. Not sure it was before this PR... if people were adding more than 4 messages with cache point, it would fail. Now, we're adding potentially 2 more.

mattref · 2025-09-05T15:22:00Z

Regarding your question @mattref, this is not addressed I think. Not sure it was before this PR... if people were adding more than 4 messages with cache point, it would fail. Now, we're adding potentially 2 more.

I'm actually not convinced that the implementation in this PR only adds "2 more" cache points. Per the implementation in this PR, one cache point is added per tool (line 379)

llama_index/llama-index-integrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/utils.py

Lines 366 to 379 in cee8a70

    
           for tool in tools: 
        
               tool_name, tool_description = tool.metadata.name, tool.metadata.description 
        
               if not tool_name: 
        
                   raise ValueError(f"Tool {tool} does not have a name.") 
        
               tool_dict = { 
        
                   "name": tool_name, 
        
                   "description": tool_description, 
        
                   # get the schema of the tool's input parameters in the format expected by AWS Bedrock Converse 
        
                   "inputSchema": {"json": tool.metadata.get_parameters_dict()}, 
        
               } 
        
               converse_tools.append({"toolSpec": tool_dict}) 
        
               if tool_caching: 
        
                   converse_tools.append({"cachePoint": {"type": "default"}})

and then for some reason another cache point is added at the very end (line 474-476):

llama_index/llama-index-integrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/utils.py

Lines 471 to 476 in cee8a70

    
           if tool_config := kwargs.get("tools"): 
        
               converse_kwargs["toolConfig"] = tool_config 
        
               if tool_caching and "tools" in converse_kwargs["toolConfig"]: 
        
                   converse_kwargs["toolConfig"]["tools"].append( 
        
                       {"cachePoint": {"type": "default"}} 
        
                   )

So, even if I have a single tool, I'm seeing this:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

and with multiple tools, I see this:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

Given, as a reference, the AWS documentation and other frameworks who have implemented Bedrock cache points, I do not believe there needs to be repeated cache points throughout the object, and especially should not have 2 next to each other at the very end.

When using 4 tools, I immediately hit an error because of the excessive cache points.

The implementation, for tool caching, could likely just add the cache point at the end:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

If there is a need for fine-grained cache points after each toolSpec, I think that might be a different conversation about how to allow the user to place these cache points themselves.

@logan-markewich Would appreciate any input/insights on this, given that this PR was already merged.

jeromevdl · 2025-09-05T19:39:39Z

Good catch, there is indeed an indentation problem: we should not add the cachepoint for each tool, but one for all tools, after the loop. This is actually why I did a PR and why there is a review...

For the other one, this is actually good. We are not supposed to add cache both in tools and in the BedrockConverse generic config.

But I mean, we can work only with tools and SystemMessage if you want and remove the BedrockConverse parameters. Remove the commits from main branch and I'll do another PR, which would be more compliant with what you want (I think), and without this indentation bug. @logan-markewich @mattref

logan-markewich · 2025-09-07T03:37:20Z

@mattref is definitely correct. The package is already published so I cant "remove the commits from main" -- best to make a follow-up. I think I was a little frustrated and merged this unexpectedly lol, so my bad on that one

The feature isn't advertised anywhere so it should be fine to make a follow-up PR with some fixes. @mattref it seems like you have a decent understanding of the bedrock API, would appreciate a PR if you have time.

FYI as I mentioned several times, its already possible for the user to insert their own cache points into the input ChatMessage objects. This PR is really more of a convenience feature.

logan-markewich · 2025-09-07T03:48:36Z

Pretty sure the above PR fixes it

jeromevdl · 2025-09-07T11:55:48Z

Yes it's ok. As I said, indentation issue, the code was in the loop instead of after. Thanks 🙏

And as I already said the problem was not with messages but with system prompt. For tools it's more for convenience I agree.

gringParis · 2025-09-11T12:12:43Z

I would like to suggest some changes @logan-markewich @jeromevdl . there are some limitations with current implementation.
Without that pr, to use the existing cachePoint as suggested by @logan-markewich initially works for the messages but not on the system prompt.
the "cachePoint": {
"type": "default"
} is not added to system messages when calling bedrock. The system prompt is probably one of the most usefull to cache and boto3 bedrock converse allows to pass an array of system messages rather than a single one. Current implementation merges all system messages into 1 single
That forbid to cache only one part of the system prompt.  With prompt caching introduction, it makes sense to have several system messages. Bedrock supports that behaviour and llama index should also.

 this pr allows only to cache the entirety of the system prompt with a boolean. This is too limiting. Instead we should use the cachePoint as suggested by @logan-markewich
 Additionally we need to:

stop concatenate system messages into a single one in messages_to_converse_messages if there is a cache point, in that case follow up system messages should be added to a list
we need to make the system_prompt param in converse_with_retry_async to be a list of dict

can you incorporate those changes in the next pr @jeromevdl, or else can i open a pr in that direction
?
Else i'll do a custom implementation but i think the community would benefit from it...

logan-markewich · 2025-09-15T02:24:26Z

Go for it @gringParis

…19737)

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Aug 25, 2025

logan-markewich reviewed Aug 26, 2025

View reviewed changes

...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py Show resolved Hide resolved

jeromevdl force-pushed the main branch from 4943fa4 to 5171840 Compare August 26, 2025 07:30

jeromevdl added 2 commits September 4, 2025 17:41

run-llama#19733: add system prompt / tool caching to BedrockConverse

b6688e5

update version

13f3630

jeromevdl force-pushed the main branch from 5171840 to 13f3630 Compare September 4, 2025 15:41

logan-markewich approved these changes Sep 4, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 4, 2025

logan-markewich enabled auto-merge (squash) September 4, 2025 21:47

linting

3587969

logan-markewich merged commit 1da7043 into run-llama:main Sep 4, 2025
11 checks passed

logan-markewich mentioned this pull request Sep 7, 2025

fix auto tool caching in bedrock-converse #19805

Merged

gringParis mentioned this pull request Sep 22, 2025

Feat/bedrock converse partial system prompt cache #19922

Merged

17 tasks

frankiekim5 pushed a commit to frankiekim5/bedrock-agentcore-memory that referenced this pull request Sep 24, 2025

Feat: add system prompt / tool caching to BedrockConverse (run-llama#…

9c5c8bb

…19737)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: add system prompt / tool caching to BedrockConverse #19737

Feat: add system prompt / tool caching to BedrockConverse #19737

Uh oh!

jeromevdl commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

mattref commented Sep 5, 2025 •

edited

Loading

Uh oh!

jeromevdl commented Sep 5, 2025

Uh oh!

mattref commented Sep 5, 2025 •

edited

Loading

Uh oh!

jeromevdl commented Sep 5, 2025

Uh oh!

logan-markewich commented Sep 7, 2025 •

edited

Loading

Uh oh!

logan-markewich commented Sep 7, 2025

Uh oh!

jeromevdl commented Sep 7, 2025 •

edited

Loading

Uh oh!

gringParis commented Sep 11, 2025 •

edited

Loading

Uh oh!

logan-markewich commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feat: add system prompt / tool caching to BedrockConverse #19737

Feat: add system prompt / tool caching to BedrockConverse #19737

Uh oh!

Conversation

jeromevdl commented Aug 25, 2025

Description

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

Uh oh!

Uh oh!

mattref commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromevdl commented Sep 5, 2025

Uh oh!

mattref commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromevdl commented Sep 5, 2025

Uh oh!

logan-markewich commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

logan-markewich commented Sep 7, 2025

Uh oh!

jeromevdl commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gringParis commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

logan-markewich commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mattref commented Sep 5, 2025 •

edited

Loading

mattref commented Sep 5, 2025 •

edited

Loading

logan-markewich commented Sep 7, 2025 •

edited

Loading

jeromevdl commented Sep 7, 2025 •

edited

Loading

gringParis commented Sep 11, 2025 •

edited

Loading