-
Couldn't load subscription status.
- Fork 6.5k
Closed
Labels
Description
Bug Description
I previously implemented a VectorIndexRetriever using LlamaIndex's built-in vector index without any vector database, with MMR mode and mmr_threshold. It worked fine.
I then added ChromaDB and found that MMR mode works as long as you don't include the mmr_threshold. If you set the mmr_threshold in the vector_store_kwargs argument, you get an error.
Version
llama-index-0.10.1
Steps to Reproduce
Here are the relevant parts of my code. If I comment out the line indicated by "# ERROR", then the code works.
from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.core.storage import StorageContext
from llama_index.core.indices.vector_store.retrievers.retriever import (
VectorIndexRetriever,
)
from llama_index.core.query_engine.retriever_query_engine import (
RetrieverQueryEngine
)
from llama_index.core import Settings
from llama_index.vector_stores.chroma.base import ChromaVectorStore
import chromadb
def run_query(
question: str,
vectorstore: str,
top_k: int,
mmr_threshold: float,
) -> RetrieverQueryEngine | None:
'''
Return an LLM response to an input query after doing a vector search.
Args:
question (str): The query to the LLM.
vectorstore (str): Folder name of vector database.
top_k (int): Number of retrievals or citations to retrieve via
vector search.
mmr_threshold (float): A float between 0 and 1, for MMR search mode.
Closer to 0 gives you more diversity in the retrievals.
Closer to 1 gives you more relevance in the retrievals.
Returns:
RetrieverQueryEngine | None: If vectorstore location exists, return a
Response object from RetrieverQueryEngine, else return nothing.
'''
if not os.path.exists(vectorstore):
print('Error: Vectorstore', vectorstore, 'not found!')
return
else:
# Instantiate a Chroma client, setting storage folder location:
client = chromadb.PersistentClient(path=vectorstore)
# Instantiate a Chroma collection based on the client:
chroma_collection = client.get_or_create_collection(vectorstore)
# Instantiate a ChromaVectorStore based on the Chroma collection:
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Instantiate a storage context based on the ChromaVectorStore:
storage_context = StorageContext.from_defaults(
vector_store=vector_store
)
# Instantiate LLM and embedding model:
llm, embedding = create_azure_models()
# Add these 2 models to the LlamaIndex Settings:
Settings.llm = llm
Settings.embed_model = embedding
index = VectorStoreIndex.from_vector_store(
vector_store,
storage_context=storage_context
)
# Instantiate and configure a VectorIndexRetriever:
# Note about parameters:
# similarity_top_k sets the number of retrievals (citations).
# mmr_threshold is a value between 0 and 1.
# Closer to 0 gives you more diversity.
# Closer to 1 gives you more relevance.
# If the data contains duplicated entries, set it lower (e.g. 0.2)
# so that the retriever will skip over search results that are
# identical or very similar and go for greater diversity.
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=top_k,
vector_store_query_mode='mmr',
vector_store_kwargs={'mmr_threshold': mmr_threshold} # ERROR
)
# Instantiate RetrieverQueryEngine and pass in the VectorIndexRetriever:
query_engine = RetrieverQueryEngine(
retriever=retriever
)
# Query the index:
response = query_engine.query(question)
return response
Relevant Logs/Tracbacks
File ".../text_data_helpers.py", line 142, in run_query
response = query_engine.query(question)
File ".../lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py", line 40, in query
return self._query(str_or_query_bundle)
File ".../lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 186, in _query
nodes = self.retrieve(query_bundle)
File ".../lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 142, in retrieve
nodes = self._retriever.retrieve(query_bundle)
File ".../lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 229, in retrieve
nodes = self._retrieve(query_bundle)
File ".../lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 94, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File ".../lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 170, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
File ".../lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 292, in query
results = self._collection.query(
TypeError: Collection.query() got an unexpected keyword argument 'mmr_threshold'Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done