-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Feat: add system prompt / tool caching to BedrockConverse #19737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...ntegrations/llms/llama-index-llms-bedrock-converse/llama_index/llms/bedrock_converse/base.py
Show resolved
Hide resolved
|
Coming back to @logan-markewich's comment:
how is this being addressed with the changes in this PR? If I enable the automatic insertion of cachepoints and go beyond the limit (e.g., 4 for Claude), I'll get errors, without any way to control the amount of cache points actually inserted. |
|
Not sure to understand, you said the PR does the opposite of what you want and approve it, I'm happy to update it to make it more compliant, if I understand what is required and this is actually possible with how Bedrock works... Regarding your question @mattref, this is not addressed I think. Not sure it was before this PR... if people were adding more than 4 messages with cache point, it would fail. Now, we're adding potentially 2 more. |
I'm actually not convinced that the implementation in this PR only adds "2 more" cache points. Per the implementation in this PR, one cache point is added per tool (line Lines 366 to 379 in cee8a70
and then for some reason another cache point is added at the very end (line Lines 471 to 476 in cee8a70
So, even if I have a single tool, I'm seeing this: {
// ....
"toolConfig": {
"tools": [
{
"toolSpec": { ... }
},
{
"cachePoint": { "type": "default" }
},
{
"cachePoint": { "type": "default" }
}
]
}
}and with multiple tools, I see this: {
// ....
"toolConfig": {
"tools": [
{
"toolSpec": { ... }
},
{
"cachePoint": { "type": "default" }
},
{
"toolSpec": { ... }
},
{
"cachePoint": { "type": "default" }
},
{
"cachePoint": { "type": "default" }
}
]
}
}Given, as a reference, the AWS documentation and other frameworks who have implemented Bedrock cache points, I do not believe there needs to be repeated cache points throughout the object, and especially should not have 2 next to each other at the very end. When using 4 tools, I immediately hit an error because of the excessive cache points. The implementation, for tool caching, could likely just add the cache point at the end: {
// ....
"toolConfig": {
"tools": [
{
"toolSpec": { ... }
},
{
"toolSpec": { ... }
},
{
"cachePoint": { "type": "default" }
}
]
}
}If there is a need for fine-grained cache points after each toolSpec, I think that might be a different conversation about how to allow the user to place these cache points themselves. @logan-markewich Would appreciate any input/insights on this, given that this PR was already merged. |
|
Good catch, there is indeed an indentation problem: we should not add the cachepoint for each tool, but one for all tools, after the loop. This is actually why I did a PR and why there is a review... For the other one, this is actually good. We are not supposed to add cache both in tools and in the BedrockConverse generic config. But I mean, we can work only with tools and SystemMessage if you want and remove the BedrockConverse parameters. Remove the commits from main branch and I'll do another PR, which would be more compliant with what you want (I think), and without this indentation bug. @logan-markewich @mattref |
|
@mattref is definitely correct. The package is already published so I cant "remove the commits from main" -- best to make a follow-up. I think I was a little frustrated and merged this unexpectedly lol, so my bad on that one The feature isn't advertised anywhere so it should be fine to make a follow-up PR with some fixes. @mattref it seems like you have a decent understanding of the bedrock API, would appreciate a PR if you have time. FYI as I mentioned several times, its already possible for the user to insert their own cache points into the input ChatMessage objects. This PR is really more of a convenience feature. |
|
Pretty sure the above PR fixes it |
|
Yes it's ok. As I said, indentation issue, the code was in the loop instead of after. Thanks 🙏 And as I already said the problem was not with messages but with system prompt. For tools it's more for convenience I agree. |
|
I would like to suggest some changes @logan-markewich @jeromevdl . there are some limitations with current implementation.
this pr allows only to cache the entirety of the system prompt with a boolean. This is too limiting. Instead we should use the cachePoint as suggested by @logan-markewich
can you incorporate those changes in the next pr @jeromevdl, or else can i open a pr in that direction |
|
Go for it @gringParis |
Description
Add system prompt caching and tool caching for BedrockConverse to reduce the number of tokens used.
Fixes #19733
Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods