FiVE-Bench (ICCV 2025)

FiVE-Bench: A Fine-Grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Minghan Li^1*, Chenxi Xie^2*, Yichen Wu¹³, Lei Zhang², Mengyu Wang^1†
¹Harvard University ²The Hong Kong Polytechnic University ³City University of Hong Kong
^*Equal contribution ^†Corresponding Author

💜 Leaderboard (coming soon) | 💻 GitHub | 🤗 Hugging Face

📝 Project Page | 📰 Paper | 🎥 Video Demo

📝 TODO List

[🔜] Add leaderboard support
[🔜] Add Wan-Edit demo page on HF
[✅ Aug-26-2025] Fix two issues: mp4_to_frames_ffmpeg and skip_timestep=17
[✅ Aug-05-2025] Release `Wan-Edit' implementation
[✅ Aug-05-2025] Release Pyramid-Edit implementation
[✅ Aug-02-2025] Add Wan-Edit results to HF for eval demo
[✅ Aug-02-2025] Evaluation code released
[✅ Mar-31-2025] Dataset uploaded to Hugging Face

📚 Table of Contents

FiVE-Bench Overview
Running Your Model on FiVE-Bench
Evaluate Editing Results
- Conventional Metrics
- FiVE-Acc: VLM-Based Metric
Citation
Acknowledgement

📦 FiVE-Bench Overview

The FiVE-Bench dataset offers a rich, structured benchmark for fine-grained video editing. The dataset includes 420 high-quality source-target prompt pairs spanning six fine-grained video editing tasks:

Object Replacement (Rigid)
Object Replacement (Non-Rigid)
Color Alteration
Material Modification
Object Addition
Object Removal

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

Download the dataset from Hugging Face: 🔗 FiVE-Bench on Hugging Face
Follow the instructions in Installation Guide to download the dataset and install the evaluation code (FiVE_Bench).

Place the downloaded dataset in the directory: ./FiVE_Bench/data. The data structure should looks like:

📁 /path/to/code/FiVE_Bench/data
├── 📁 assets/
├── 📁 edit_prompt/
│   ├── 📄 edit1_FiVE.json
│   ├── 📄 edit2_FiVE.json
│   ├── 📄 edit3_FiVE.json
│   ├── 📄 edit4_FiVE.json
│   ├── 📄 edit5_FiVE.json
│   └── 📄 edit6_FiVE.json
├── 📄 README.md
├── 📦 bmasks.zip 
├── 📁 bmasks 
│   ├── 📁 0001_bus
│       ├── 🖼️ 00001.jpg
│       ├── 🖼️ 00002.jpg
│       ├── 🖼️ ...
│   ├── 📁 ...
├── 📦 images.zip 
├── 📁 images
│   ├── 📁 0001_bus
│       ├── 🖼️ 00001.jpg
│       ├── 🖼️ 00002.jpg
│       ├── 🖼️ ...
│   ├── 📁 ...
├── 📦 videos.zip 
├── 📁 videos
│   ├── 🎞️ 0001_bus.mp4
│   ├── 🎞️ 0002_girl-dog.mp4
│   ├── 🎞️ ...

🛠️ Step 2: Apply Your Video Editing Method

Use your video editing method to edit the FiVE-Bench videos based on the provided text prompts and generate the corresponding edited results.

Example implementations of our proposed rectified flow (RF)-based video editing methods are provided provided in the models/ directory:

- **[Pyramid-Edit](models/README.md#pyramid-edit)**: Diffusion-based video editing using Pyramid-Flow architecture

- **[Wan-Edit](models/README.md#wan-edit)**: Rectified flow-based video editing with Wan2.1-T2V-1.3B model

Quick Start with Provided Models

Run Pyramid-Edit:

# Setup model
cd models/pyramid-edit && mkdir -p hf/pyramid-flow-miniflux
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

Run Wan-Edit:

# Setup model  
cd models/wan-edit && mkdir -p hf/Wan2.1-T2V-1.3B
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

For detailed setup instructions and configuration options, see the Models Documentation.

📊 Step 3: Evaluate Editing Results

Follow the installation guide in Installation Guide to get the evaluation results.

sh scripts/eval_FiVE.sh

Evaluation Support Elements:

Editing Masks: Generated using SAM2 to assist in localized metric evaluation.
Editing Instructions: Structured directives for each source-target pair to guide model behavior.

FiVE-Bench provides comprehensive evaluation through two major components:

📐 1. Conventional Metrics (Across Six Key Aspects)

These metrics quantitatively measure various dimensions of video editing quality:

Structure Preservation
Background Preservation
(PSNR, LPIPS, MSE, SSIM outside the editing mask)
Edit Prompt–Image Consistency
(CLIP similarity on full and masked images)
Image Quality Assessment
(NIQE)
Temporal Consistency
(MFS: Motion Fidelity Score):
Runtime Efficiency

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

We use a vision-language model (VLM) to automatically assess whether the intended edits are reflected in the video outputs by asking it questions about the content. If the source video contains a swan, and the target prompt requests a flamingo. For the edited video, we ask

Yes/No Questions:
- Is there a swan in the video?
- Is there a flamingo in the video?
✅ The edit is considered successful only if the answers are "No" to the first question and "Yes" to the second.
Multiple-choice Questions:
- What is in the video? a) A swan b) A flamingo
✅ The edit is considered successful if the model selects the correct target object (e.g., b) A flamingo) and avoids selecting the original source object.

FiVE-Acc evaluates editing success using a vision-language model (VLM) by asking content-related questions:

YN-Acc: Yes/No question accuracy
MC-Acc: Multiple-choice question accuracy
U-Acc: Union accuracy – success if any question is correct
∩-Acc: Intersection accuracy – success only if all questions are correct
FiVE-Acc ↑: Final score = average of all above metrics (higher is better)

📚 Citation

If you use FiVE-Bench in your research, please cite us:

@article{li2025five,
  title={Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models},
  author={Li, Minghan and Xie, Chenxi and Wu, Yichen and Zhang, Lei and Wang, Mengyu},
  journal={arXiv preprint arXiv:2503.13684},
  year={2025}
}

❤️ Acknowledgement

Part of the code is adapted from PIE-Bench.
We thank the authors for their excellent work and for making their code publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
evaluation		evaluation
files		files
models		models
results		results
scripts		scripts
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
config.yaml		config.yaml
data		data
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FiVE-Bench (ICCV 2025)

📝 TODO List

📚 Table of Contents

📦 FiVE-Bench Overview

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

🛠️ Step 2: Apply Your Video Editing Method

Quick Start with Provided Models

📊 Step 3: Evaluate Editing Results

📐 1. Conventional Metrics (Across Six Key Aspects)

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

📚 Citation

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Harvard-AI-and-Robotics-Lab/FiVE-Bench

Folders and files

Latest commit

History

Repository files navigation

FiVE-Bench (ICCV 2025)

📝 TODO List

📚 Table of Contents

📦 FiVE-Bench Overview

Running Your Model on FiVE-Bench

⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

🛠️ Step 2: Apply Your Video Editing Method

Quick Start with Provided Models

📊 Step 3: Evaluate Editing Results

📐 1. Conventional Metrics (Across Six Key Aspects)

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

📚 Citation

❤️ Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages