Skip to content

Harvard-AI-and-Robotics-Lab/FiVE-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FiVE-Bench (ICCV 2025)

FiVE-Bench: A Fine-Grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Minghan Li1*, Chenxi Xie2*, Yichen Wu13, Lei Zhang2, Mengyu Wang1†
1Harvard University 2The Hong Kong Polytechnic University 3City University of Hong Kong
*Equal contribution Corresponding Author

💜 Leaderboard (coming soon)   |   💻 GitHub   |   🤗 Hugging Face  

📝 Project Page   |   📰 Paper   |   🎥 Video Demo  

five-pipe


📝 TODO List

  • [🔜] Add leaderboard support
  • [🔜] Add Wan-Edit demo page on HF
  • [✅ Aug-26-2025] Fix two issues: mp4_to_frames_ffmpeg and skip_timestep=17
  • [✅ Aug-05-2025] Release `Wan-Edit' implementation
  • [✅ Aug-05-2025] Release Pyramid-Edit implementation
  • [✅ Aug-02-2025] Add Wan-Edit results to HF for eval demo
  • [✅ Aug-02-2025] Evaluation code released
  • [✅ Mar-31-2025] Dataset uploaded to Hugging Face

📚 Table of Contents


📦 FiVE-Bench Overview

five

The FiVE-Bench dataset offers a rich, structured benchmark for fine-grained video editing. The dataset includes 420 high-quality source-target prompt pairs spanning six fine-grained video editing tasks:

  1. Object Replacement (Rigid)
  2. Object Replacement (Non-Rigid)
  3. Color Alteration
  4. Material Modification
  5. Object Addition
  6. Object Removal

Running Your Model on FiVE-Bench

five-bench1


⬇️ Step 1: Download the Dataset and Set Up Evaluation Code

  • Download the dataset from Hugging Face: 🔗 FiVE-Bench on Hugging Face

  • Follow the instructions in Installation Guide to download the dataset and install the evaluation code (FiVE_Bench).

  • Place the downloaded dataset in the directory: ./FiVE_Bench/data. The data structure should looks like:

    📁 /path/to/code/FiVE_Bench/data
    ├── 📁 assets/
    ├── 📁 edit_prompt/
    │   ├── 📄 edit1_FiVE.json
    │   ├── 📄 edit2_FiVE.json
    │   ├── 📄 edit3_FiVE.json
    │   ├── 📄 edit4_FiVE.json
    │   ├── 📄 edit5_FiVE.json
    │   └── 📄 edit6_FiVE.json
    ├── 📄 README.md
    ├── 📦 bmasks.zip 
    ├── 📁 bmasks 
    │   ├── 📁 0001_bus
    │       ├── 🖼️ 00001.jpg
    │       ├── 🖼️ 00002.jpg
    │       ├── 🖼️ ...
    │   ├── 📁 ...
    ├── 📦 images.zip 
    ├── 📁 images
    │   ├── 📁 0001_bus
    │       ├── 🖼️ 00001.jpg
    │       ├── 🖼️ 00002.jpg
    │       ├── 🖼️ ...
    │   ├── 📁 ...
    ├── 📦 videos.zip 
    ├── 📁 videos
    │   ├── 🎞️ 0001_bus.mp4
    │   ├── 🎞️ 0002_girl-dog.mp4
    │   ├── 🎞️ ...

🛠️ Step 2: Apply Your Video Editing Method

Use your video editing method to edit the FiVE-Bench videos based on the provided text prompts and generate the corresponding edited results.

rf-editing

Example implementations of our proposed rectified flow (RF)-based video editing methods are provided provided in the models/ directory:

- **[Pyramid-Edit](models/README.md#pyramid-edit)**: Diffusion-based video editing using Pyramid-Flow architecture

- **[Wan-Edit](models/README.md#wan-edit)**: Rectified flow-based video editing with Wan2.1-T2V-1.3B model

Quick Start with Provided Models

Run Pyramid-Edit:

# Setup model
cd models/pyramid-edit && mkdir -p hf/pyramid-flow-miniflux
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

Run Wan-Edit:

# Setup model  
cd models/wan-edit && mkdir -p hf/Wan2.1-T2V-1.3B
# Download model checkpoint to hf/ directory
bash scripts/run_FiVE.sh

For detailed setup instructions and configuration options, see the Models Documentation.


📊 Step 3: Evaluate Editing Results

Follow the installation guide in Installation Guide to get the evaluation results.

sh scripts/eval_FiVE.sh

Evaluation Support Elements:

  • Editing Masks: Generated using SAM2 to assist in localized metric evaluation.

  • Editing Instructions: Structured directives for each source-target pair to guide model behavior.

FiVE-Bench provides comprehensive evaluation through two major components:

📐 1. Conventional Metrics (Across Six Key Aspects)

These metrics quantitatively measure various dimensions of video editing quality:

  • Structure Preservation
  • Background Preservation
    (PSNR, LPIPS, MSE, SSIM outside the editing mask)
  • Edit Prompt–Image Consistency
    (CLIP similarity on full and masked images)
  • Image Quality Assessment
    (NIQE)
  • Temporal Consistency
    (MFS: Motion Fidelity Score):
  • Runtime Efficiency

five-bench-eval1

🤖 2. FiVE-Acc: A VLM-based Metric for Editing Success

We use a vision-language model (VLM) to automatically assess whether the intended edits are reflected in the video outputs by asking it questions about the content. If the source video contains a swan, and the target prompt requests a flamingo. For the edited video, we ask

  • Yes/No Questions:

    • Is there a swan in the video?
    • Is there a flamingo in the video?

    ✅ The edit is considered successful only if the answers are "No" to the first question and "Yes" to the second.

  • Multiple-choice Questions:

    • What is in the video? a) A swan b) A flamingo

    ✅ The edit is considered successful if the model selects the correct target object (e.g., b) A flamingo) and avoids selecting the original source object.

FiVE-Acc evaluates editing success using a vision-language model (VLM) by asking content-related questions:

  • YN-Acc: Yes/No question accuracy
  • MC-Acc: Multiple-choice question accuracy
  • U-Acc: Union accuracy – success if any question is correct
  • ∩-Acc: Intersection accuracy – success only if all questions are correct
  • FiVE-Acc ↑: Final score = average of all above metrics (higher is better)

five-bench-eval2

📚 Citation

If you use FiVE-Bench in your research, please cite us:

@article{li2025five,
  title={Five: A fine-grained video editing benchmark for evaluating emerging diffusion and rectified flow models},
  author={Li, Minghan and Xie, Chenxi and Wu, Yichen and Zhang, Lei and Wang, Mengyu},
  journal={arXiv preprint arXiv:2503.13684},
  year={2025}
}

❤️ Acknowledgement

Part of the code is adapted from PIE-Bench.
We thank the authors for their excellent work and for making their code publicly available.

About

[ICCV 2025] FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published