Official PyTorch implementation of the paper "LPAH: Language-guided Patch Aggregation Hashing for Fine-grained Image Retrieval" (PRCV 2025).
LPAH utilizes automatically synthesized textual semantics to guide token aggregation in ViT, preserving discriminative visual patterns for better hashing performance. The framework introduces three key modules: CLA, CPA and PWA. These modules improve the model's ability to capture subtle visual distinctions, enhancing performance in fine-grained image retrieval. The method is trained and evaluated on common fine-grained datasets, including CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and VegFru.
Please refer to requirements.txt for the required dependencies.
-
CUB-200-2011: Perona Lab - CUB-200-2011
-
Stanford Dogs: Stanford Dogs dataset for Fine-Grained Visual Categorization
-
FGVC-Aircraft: FGVC-Aircraft
-
VegFru: VegFru: A Domain-Specific Dataset for Fine-grained Visual Categorization
- datasets
- CUB_200_2011
- Stanford_Dogs
- Aircraft
- vegfru
Navigate to the text_generation directory:
cd text_generation-
For CUB or Stanford Dogs, run
python cub_dogs.py
-
For Aircraft, run
python aircraft.py
-
For VegFru
python vegfru.py
Please visit Please visit the following links to download the pre-trained ViT models:
-
ViT-B_16 pre-trained on ImageNet-21k: ViT Models - ImageNet - 21K
-
ViT-B_16 pre-trained on ImageNet with SAM: ViT Models - SAM - ImageNet
During training, the above models will be used as a backbone for LPAH.
Run the following command to start training and evaluation:
python train.py --dataset [dataset_name] --epoch 100 --eval_every 5 --warmup_epochs 20 --name [logs_dir_name] --train_batch_size 64 --hash_bit_list 16,32,48,64 --learning_rate 0.02-
CUB_200_2011for CUB-200-2011. -
Stanford_Dogsfor Stanford Dogs. -
vegfrufor VegFru -
Aircraftfor FGVC-Aircraft
The above command runs training of LPAH, with evaluation conducted every 5 epochs.
