-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Adding a very simple implementation of an embeddings cache #18864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @AstraBert, Thanks for taking care of this. However, I believe that it would be better to accept a generic |
| if collection in self._data and len(collection) > self._maximum_data_point: | ||
| if self._strict: | ||
| raise ValueError(f"Exceeded the maximum number of data points that can be uploaded to collection {collection}") | ||
| first_key = next(iter(self._data[collection].keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this relies on the dict being ordered 👀 (I can't remember how dict ordering works in python or if OrderedDict is needed or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude suggests that python dicts follow the order of the addition of the keys (since python 3.7) , but maybe we can follow @Florian-BACHO's suggestion and revert this adding support for BaseKVStore, and avoid modifying the SimpleKVStore at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea supporting the BaseKVStore makes total sense (I missed that it was hardcoded to SimpleKVStore)
|
For some reasons ruff-format passes locally on pre-commit but does not pass here @logan-markewich :/ |
|
I'd say with this last commit this PR can be merged? @logan-markewich |
Description
Added a very simple implementation of an embeddings cache that:
BaseEmbeddingclass with anembeddings_cache, that is aSimpleKVStoreSimpleKVStorehas now a maximum of data points that can be uploaded to it, to avoid the cache swamping the memory (since the cache is never cleared at runtime).Example of a cache:
{ "embeddings": { "text-to-be-embedded": { "uuid": [0, 0, 1, 0.3] }, "text-to-be-embedded-2": { "uuid": [0, 0, 1, 0.3] }, } }Fixes #18849
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods