This project is a Multi-Modal Search Engine developed using CLIP by OpenAI, with Flask API for backend and HTML/CSS for the frontend web application.
This project provides a seamless web interface where users can input text queries, and the system retrieves relevant images based on the textual description based on CLIP architecture read the paper.
- This video demonstrates how to use our project's main feature.
- Sample data of 130 images is present in the file or
- See the video or
- Place your images in
src/minidata - Run the notebook
src/image-processor - Move the data in
src/image_embeddings& the data insrc/minidatatoflaskapp/image_embeddings&flaskapp/staticrespectively (caution: transfer the data, not the directories)
- Multi-Modal Search: Users can input textual descriptions of images to retrieve relevant images.
- Intuitive Web Interface: The frontend is built using React to provide a user-friendly experience.
- Scalable Backend: Flask API serves as the backend, handling requests and interacting with the CLIP model.
Clone the repository:
git clone https://github.com/ahmedembeddedxx/multimodal-search-engine.gitStart the backend server:
cd flaskapp/
flask runAccess the web application in your browser at http://127.0.0.1:5000/.
- Shift the app to ReactJs
- Use ImageBind by MetaAI
- More accurate modal evaluation
- Integrate Audio & Video Functionality




