LPAH: Language-guided Patch Aggregation Hashing for Fine-grained Image Retrieval

Official PyTorch implementation of the paper "LPAH: Language-guided Patch Aggregation Hashing for Fine-grained Image Retrieval" (PRCV 2025).

Overview

LPAH utilizes automatically synthesized textual semantics to guide token aggregation in ViT, preserving discriminative visual patterns for better hashing performance. The framework introduces three key modules: CLA, CPA and PWA. These modules improve the model's ability to capture subtle visual distinctions, enhancing performance in fine-grained image retrieval. The method is trained and evaluated on common fine-grained datasets, including CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and VegFru.

How to run

Requirements

Please refer to requirements.txt for the required dependencies.

Step 1: Prepare Datasets

CUB-200-2011: Perona Lab - CUB-200-2011
Stanford Dogs: Stanford Dogs dataset for Fine-Grained Visual Categorization
FGVC-Aircraft: FGVC-Aircraft
VegFru: VegFru: A Domain-Specific Dataset for Fine-grained Visual Categorization

- datasets
    - CUB_200_2011
    - Stanford_Dogs
    - Aircraft
    - vegfru

Step 2: Synthetize Captions

Navigate to the text_generation directory:

cd text_generation

For CUB or Stanford Dogs, run
```
python cub_dogs.py
```
For Aircraft, run
```
python aircraft.py
```
For VegFru
```
python vegfru.py
```

Step 3: Download Pre-trained ViT models

Please visit Please visit the following links to download the pre-trained ViT models:

ViT-B_16 pre-trained on ImageNet-21k: ViT Models - ImageNet - 21K
ViT-B_16 pre-trained on ImageNet with SAM: ViT Models - SAM - ImageNet

During training, the above models will be used as a backbone for LPAH.

Step 4: Training and Evaluation

Run the following command to start training and evaluation:

python train.py --dataset [dataset_name] --epoch 100 --eval_every 5 --warmup_epochs 20  --name [logs_dir_name] --train_batch_size 64 --hash_bit_list 16,32,48,64 --learning_rate 0.02

Optional `dataset_name`

CUB_200_2011 for CUB-200-2011.
Stanford_Dogs for Stanford Dogs.
vegfru for VegFru
Aircraft for FGVC-Aircraft

The above command runs training of LPAH, with evaluation conducted every 5 epochs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
text_generation		text_generation
utils		utils
LICENSE		LICENSE
LPAH.png		LPAH.png
README.md		README.md
configs.yaml		configs.yaml
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LPAH: Language-guided Patch Aggregation Hashing for Fine-grained Image Retrieval

Overview

How to run

Requirements

Step 1: Prepare Datasets

Step 2: Synthetize Captions

Step 3: Download Pre-trained ViT models

Step 4: Training and Evaluation

Optional `dataset_name`

About

Uh oh!

Releases

Packages

Languages

License

zhenglab/LPAH

Folders and files

Latest commit

History

Repository files navigation

LPAH: Language-guided Patch Aggregation Hashing for Fine-grained Image Retrieval

Overview

How to run

Requirements

Step 1: Prepare Datasets

Step 2: Synthetize Captions

Step 3: Download Pre-trained ViT models

Step 4: Training and Evaluation

Optional dataset_name

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Optional `dataset_name`

Packages