Codes and models for our information retrieval research papers.
Knowledge Computing and Service Group, Institute of Information Engineering, Chinese Academy of Sciences.
[AAAI25] tDRO: Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval. The tDRO (task-level Distributionally Robust Optimization) algorithm for Large Language Model-based Dense Retrieval (LLM-DR) fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. (Accepted as AAAI 2025. Main Conference, Long Paper, Poster.)
[SIGIR24] bowdpr: Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval. Bag-of-Word Prediction is a new encoder-only pre-training schema for dense retrieval targeted at efficiency and interpretability. (Accepted by SIGIR 2024. Main Conference, Long Paper, Oral.)
[EMNLP23] CoT-MAE-qc: Query-as-context Pre-training for Dense Passage Retrieval. A simple yet effective pre-training scheme for single vector Dense Passage Retrieval. (Accepted by EMNLP 2023. Main Conference, Long Paper, Poster.)
[AAAI23] CoT-MAE: ConTextual Mask Auto-Encoder for Dense Passage Retrieval. CoT-MAE is a transformers based Mask Auto-Encoder pre-training architecture designed for Dense Passage Retrieval. (Accepted by AAAI 2023. Main Conference, Long Paper, Oral.)
If you have questions, please feel free to email me.
Contacts: Guangyuan Ma ([email protected])