This is the source code for Towards Generalizable Forgery Detection and Reasoning. In this paper,
- We formulate detection and explanation as a unified Forgery Detection and Reasoning task (FDR-Task), leveraging Multi-Modal Large Language Models (MLLMs) to provide accurate detection through reliable reasoning over forgery attributes.
- We introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 120K images across 10 generative models, with 378K reasoning annotations on forgery attributes, enabling comprehensive evaluation of the FDR-Task.
- We propose FakeReasoning, a forgery detection and reasoning framework with three key components: 1) a dual-branch visual encoder that integrates CLIP and DINO to capture both high-level semantics and low-level artifacts; 2) a Forgery-Aware Feature Fusion Module that leverages DINO's attention maps and cross-attention mechanisms to guide MLLMs toward forgery-related clues; 3) a Classification Probability Mapper that couples language modeling and forgery detection, enhancing overall performance.
- Aug 27 2025: The pretrained model and source code are released. If you have followed our earlier work, please note that both the dataset and method have been updated. Check details on arXiv.
- Jun 11 2025: The MMFR-Dataset is released! Also we provide codes to follow our dataset construction pipeline.
- Apr 15 2025: The Project Page of our paper has been published! Click to find more about performance of FakeReasoning and samples in MMFR-Dataset.
- Mar 27 2025: Our Paper is released on arXiv.
The training set of MMFR-Dataset contains 50K fake images with 129K reasoning annotations and 50K real images with 183K reasoning annotations. The evaluation sets of MMFR-Dataset contains 20K images with 66K reasoning annotations across 10 generative models.
MMFR-Dataset is available on huggingface. Download all split .tar
files, concatenate them into a single archive, and then extract the dataset.
./
├── diffusiondb
│ ├── part-000001
│ │ ├── 0a3c75bb-4bd0-47c8-a2ba-e2aee92ad43f.png
│ │ └── [...]
│ ├── [...]
│ ├── part-000051
│ └── diffusiondb_reasoning.json
├── laion
│ ├── 00000
│ │ ├── 000000000.jpg
│ │ └── [...]
│ ├── [...]
│ ├── 00047
│ └── laion_reasoning.json
├── evaluation_sets
│ ├── stablediffusion
│ │ ├── 0_real
│ │ ├── 1_fake
│ │ └── stablediffusion_reasoning.json
│ ├── [...]
│ └── gigagan
└── forgery_reasoning_cot.json
forgery_reasoning_cot.json
contains instruction-CoT annotations for the training set. We also provide original reasoning annotations in diffusiondb_reasoning.json
and laion_reasoning.json
(for the training set). Reasoning annotations for evaluation sets, such as stablediffusion_reasoning.json
, can be found within their respective subfolders.
Codes are included in ./mmfr_generation/
. We use batch API of GPT-4o for dataset generation. To follow our construction pipeline:
- Generate jsonl files with
get_jsonl.py
for batch requests. - Upload your jsonl files and get output from GPT-4o with
batch_api_generation.ipynb
. - Organize original output from GPT-4o to structured reasoning annotation with
output_to_reasoning.py
.
The implementation is based on torch==2.1.2+cu121.
- Clone this repository and navigate to the LLaVA folder
git clone https://github.com/PRIS-CV/FakeReasoning.git
cd LLaVA
- Install required packages
conda create -n fakereasoning python=3.10
conda activate fakereasoning
pip install -e .
- Install additional dependencies for training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
If the installation of flash-attn fails, please visit the official GitHub release page and install the corresponding .whl
package.
- Install additional dependencies for evaluation
pip install nltk
pip install rouge-score
- Download base models
FakeReasoning is built upon the following models:
Please download the corresponding pretrained weights before running the framework.
The pretrained model of FakeReasoning is available on Hugging Face. To use your local weight of openai/clip-vit-large-patch14-336, modify the "mm_vision_tower"
in config.json
to path_to_clip-vit-large-patch14-336
.
cd LLaVA/forgery_eval
export DINO_PATH='path_to_dinov2-main'
export DINO_WEIGHT='path_to_dinov2_vitl14_pretrain.pth'
python inference.py \
--model-path path_to_FakeReasoning_weights \
--img_path commonFake_COCO_if_stage_III_189.png
python eval.py \
--model-path path_to_FakeReasoning_weights \
--dataset_path path_to_MMFR-Dataset \
--result_folder ./results \
--clip_path path_to_clip-vit-large-patch14-336
FakeReasoning is trained on 8× A800 GPUs (40GB) for 3 epochs, with the entire training completed in about 7 hours.
cd LLaVA
export DINO_PATH='path_to_dinov2-main'
export DINO_WEIGHT='path_to_dinov2_vitl14_pretrain.pth'
bash finetune_task_lora.sh \
--data_path path_to_forgery_reasoning_cot.json \
--model_name_or_path path_to_MoF_Models \
--image_folder path_to_MMFR-Dataset \
--vision_tower path_to_clip-vit-large-patch14-336
per_device_train_batch_size × gradient_accumulation_steps × num_gpus = 128
If you find this work useful for your research, please kindly cite our paper:
@article{gao2025fakereasoning,
title={FakeReasoning: Towards Generalizable Forgery Detection and Reasoning},
author={Gao, Yueying and Chang, Dongliang and Yu, Bingyao and Qin, Haotian and Chen, Lei and Liang, Kongming and Ma, Zhanyu},
journal={arXiv preprint arXiv:2503.21210},
year={2025},
url={https://arxiv.org/abs/2503.21210}
}
We are thankful to LLaVA, MMVP, DINOv2, UniFD, and MCAN for releasing their models and code as open-source contributions.