Welcome to the repo of the tutorial "Multimodal RAG with LangChain and Local LLaVa Model". This repo contains code and helper data of the Youtube video tutorial, third video of the series "Practical RAG with LangChain under 15 Minutes".
In this part of the series, we take RAG to one level ahead and work with a multimodal model - LLaVa and multimodal embeddings - OpenCLIP embeddings. All code runs on local GPU, for employing OPenCLIP embeddings we use langchain-experimental library code and for LLaVa model we use Hugging Face transformers library code.
The code is on Google Colab for GPU availability. If you wanna work on the Colab notebook directly, please navigate to: https://colab.research.google.com/drive/1BDRx_XhxHIYGatwWCfWpdV4ZUeTxF3s7?usp=sharing
and make your own copy of the notebook. You can also find the notebook in the repo. The requirements are installed in the first few lines of the notebook.
You can access first video and second video) of the series on Youtube as well, their Github repos are linked in the videos.
If you would like more content, don't forget to visit finetuning LLM series of mine, Practical LLM Finetuning Recipes.
Thanks for your visit and happy coding!