-
Notifications
You must be signed in to change notification settings - Fork 0
Improve add_documents CLI validation and embedding configuration #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR modernizes the CLI interface for adding documents to a Chroma collection by replacing manual argument parsing with argparse and adding comprehensive input validation. The changes improve error handling and ensure the embedding function is properly configured for the collection.
- Replaced manual
sys.argvparsing with argparse for better CLI handling - Added JSON validation and file input support for CLI arguments
- Configured OpenAI embedding function when creating/retrieving the collection
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| def add_to_openai_collection(collection, documents, metadatas, ids): | ||
| """Insert the provided payload into ``collection`` and report basic failures.""" |
Copilot
AI
Oct 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lacks type hints while other functions in the file have been updated with proper typing. Add type annotations for consistency and better maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codex fix comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
- Quoted the Chroma client and collection annotations so the module no longer dereferences unavailable chromadb internals at import time. add_documents.pyL38-L46
- Added explicit typing to
add_to_openai_collectioninputs for consistency with the rest of the module. add_documents.pyL94-L110
Testing
⚠️ No automated tests were run (not requested).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
add_documents.py
Outdated
| def create_or_get_collection(client: chromadb.ClientAPI, | ||
| embedding_function: embedding_functions.OpenAIEmbeddingFunction | ||
| ) -> chromadb.api.models.Collection.Collection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard type annotations that access chromadb internals
The new annotations in create_or_get_collection dereference chromadb.ClientAPI and chromadb.api.models.Collection.Collection at definition time. Because from __future__ import annotations is not used, these expressions are evaluated when the module is imported and the top-level chromadb package does not expose ClientAPI (and may not preload chromadb.api.models). Importing the CLI now raises AttributeError before any logic runs. Quote these annotations or import the specific types from chromadb.api so the script can execute.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codex fix comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
- Quoted the Chroma client and collection type annotations to avoid runtime attribute lookups when importing the CLI module. add_documents.pyL3-L7 add_documents.pyL38-L46
- Added explicit list-based type hints for the payload insertion helper to align with the validated data structures. add_documents.pyL94-L109
Testing
⚠️ Not run (not requested).
Co-authored-by: Copilot <[email protected]>
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68e42c0e7a0c83288b60fe361a987144