Skip to content

Conversation

@jeromevdl
Copy link
Contributor

Description

Add system prompt caching and tool caching for BedrockConverse to reduce the number of tokens used.

Fixes #19733

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • [] I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Aug 25, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 4, 2025
@logan-markewich logan-markewich enabled auto-merge (squash) September 4, 2025 21:47
@logan-markewich logan-markewich merged commit 1da7043 into run-llama:main Sep 4, 2025
11 checks passed
@mattref
Copy link
Contributor

mattref commented Sep 5, 2025

Coming back to @logan-markewich's comment:

If we want to auto-insert cachepoints, we should at most insert one (otherwise we risk placing too many and creating errors, the limit is 4)

how is this being addressed with the changes in this PR? If I enable the automatic insertion of cachepoints and go beyond the limit (e.g., 4 for Claude), I'll get errors, without any way to control the amount of cache points actually inserted.

@jeromevdl
Copy link
Contributor Author

Not sure to understand, you said the PR does the opposite of what you want and approve it, I'm happy to update it to make it more compliant, if I understand what is required and this is actually possible with how Bedrock works...

Regarding your question @mattref, this is not addressed I think. Not sure it was before this PR... if people were adding more than 4 messages with cache point, it would fail. Now, we're adding potentially 2 more.

@mattref
Copy link
Contributor

mattref commented Sep 5, 2025

Regarding your question @mattref, this is not addressed I think. Not sure it was before this PR... if people were adding more than 4 messages with cache point, it would fail. Now, we're adding potentially 2 more.

I'm actually not convinced that the implementation in this PR only adds "2 more" cache points. Per the implementation in this PR, one cache point is added per tool (line 379)

for tool in tools:
tool_name, tool_description = tool.metadata.name, tool.metadata.description
if not tool_name:
raise ValueError(f"Tool {tool} does not have a name.")
tool_dict = {
"name": tool_name,
"description": tool_description,
# get the schema of the tool's input parameters in the format expected by AWS Bedrock Converse
"inputSchema": {"json": tool.metadata.get_parameters_dict()},
}
converse_tools.append({"toolSpec": tool_dict})
if tool_caching:
converse_tools.append({"cachePoint": {"type": "default"}})

and then for some reason another cache point is added at the very end (line 474-476):

if tool_config := kwargs.get("tools"):
converse_kwargs["toolConfig"] = tool_config
if tool_caching and "tools" in converse_kwargs["toolConfig"]:
converse_kwargs["toolConfig"]["tools"].append(
{"cachePoint": {"type": "default"}}
)

So, even if I have a single tool, I'm seeing this:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

and with multiple tools, I see this:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

Given, as a reference, the AWS documentation and other frameworks who have implemented Bedrock cache points, I do not believe there needs to be repeated cache points throughout the object, and especially should not have 2 next to each other at the very end.

When using 4 tools, I immediately hit an error because of the excessive cache points.

The implementation, for tool caching, could likely just add the cache point at the end:

{
  // ....
  "toolConfig": {
    "tools": [
      {
        "toolSpec": { ... }
      },
      {
        "toolSpec": { ... }
      },
      {
        "cachePoint": { "type": "default" }
      }
    ]
  }
}

If there is a need for fine-grained cache points after each toolSpec, I think that might be a different conversation about how to allow the user to place these cache points themselves.

@logan-markewich Would appreciate any input/insights on this, given that this PR was already merged.

@jeromevdl
Copy link
Contributor Author

Good catch, there is indeed an indentation problem: we should not add the cachepoint for each tool, but one for all tools, after the loop. This is actually why I did a PR and why there is a review...

For the other one, this is actually good. We are not supposed to add cache both in tools and in the BedrockConverse generic config.

But I mean, we can work only with tools and SystemMessage if you want and remove the BedrockConverse parameters. Remove the commits from main branch and I'll do another PR, which would be more compliant with what you want (I think), and without this indentation bug. @logan-markewich @mattref

@logan-markewich
Copy link
Collaborator

logan-markewich commented Sep 7, 2025

@mattref is definitely correct. The package is already published so I cant "remove the commits from main" -- best to make a follow-up. I think I was a little frustrated and merged this unexpectedly lol, so my bad on that one

The feature isn't advertised anywhere so it should be fine to make a follow-up PR with some fixes. @mattref it seems like you have a decent understanding of the bedrock API, would appreciate a PR if you have time.

FYI as I mentioned several times, its already possible for the user to insert their own cache points into the input ChatMessage objects. This PR is really more of a convenience feature.

@logan-markewich
Copy link
Collaborator

Pretty sure the above PR fixes it

@jeromevdl
Copy link
Contributor Author

jeromevdl commented Sep 7, 2025

Yes it's ok. As I said, indentation issue, the code was in the loop instead of after. Thanks 🙏

And as I already said the problem was not with messages but with system prompt. For tools it's more for convenience I agree.

@gringParis
Copy link
Contributor

gringParis commented Sep 11, 2025

I would like to suggest some changes @logan-markewich @jeromevdl . there are some limitations with current implementation.
Without that pr, to use the existing cachePoint as suggested by @logan-markewich initially works for the messages but not on the system prompt.
the "cachePoint": {
"type": "default"
} is not added to system messages when calling bedrock. The system prompt is probably one of the most usefull to cache and boto3 bedrock converse allows to pass an array of system messages rather than a single one. Current implementation merges all system messages into 1 single
That forbid to cache only one part of the system prompt. 
With prompt caching introduction, it makes sense to have several system messages. Bedrock supports that behaviour and llama index should also.


this pr allows only to cache the entirety of the system prompt with a boolean. This is too limiting. Instead we should use the cachePoint as suggested by @logan-markewich

Additionally we need to:

  • stop concatenate system messages into a single one in messages_to_converse_messages if there is a cache point, in that case follow up system messages should be added to a list
  • we need to make the system_prompt param in converse_with_retry_async to be a list of dict

can you incorporate those changes in the next pr @jeromevdl, or else can i open a pr in that direction
?
Else i'll do a custom implementation but i think the community would benefit from it...

@logan-markewich
Copy link
Collaborator

Go for it @gringParis

frankiekim5 pushed a commit to frankiekim5/bedrock-agentcore-memory that referenced this pull request Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Add system prompt / tools caching in BedrockConverse

4 participants