This is the official PyTorch implementation of our paper:
TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection, ICCVW 2025
Suhwan Cho, Minhyeok Lee, Jungho Lee, Sunghun Yang, Sangyoun Lee
Link: [ICCVW] [arXiv]
You can also find other related papers at awesome-video-object-segmentation.
Leveraging large-scale image datasets is a common strategy in video processing tasks. However, motion-guided approaches are limited in this regard, as spatial distortions cannot effectively simulate realistic motion dynamics. In this study, we demonstrate that image-to-video generation models can generate realistic optical flows by appropriately transforming static images. Our synthetic video dataset, DUTS-Video, offers valuable potential for future research.
1. Download the datasets: DUTS, DAVIS16, DAVSOD, FBMS, ViSal.
2. Estimate and save optical flow maps from the videos using RAFT.
3. For DUTS, simulate future frames and optical flow maps using Stable Video Diffusion.
4. I also provide the pre-processed datasets: DUTS-Video, DAVIS16, FBMS, DAVSOD, ViSal.
Start TransFlow training with:
python run.py --train
Verify the following before running:
✅ Training dataset selection and configuration
✅ GPU availability and configuration
✅ Backbone network selection
Run TransFlow with:
python run.py --test
Verify the following before running:
✅ Testing dataset selection
✅ GPU availability and configuration
✅ Backbone network selection
✅ Pre-trained model path
Pre-trained model (mitb0)
Pre-trained model (mitb1)
Pre-trained model (mitb2)
Pre-computed results
Code and models are only available for non-commercial research purposes.
For questions or inquiries, feel free to contact:
E-mail: [email protected]