Paper's Link:Two-Stream Convolutional Networks for Action Recognition
The backbone of each stream is ResNet-50
| Stream | Accuracy | 
|---|---|
| RGB | - | 
| Optical Flow | - | 
| Fusion (Two Stream) | 73.53% (only stack 4 optical flow images:2 x_direction 2 y_direction) | 
- Ubuntu 16.04.7 LTS
 - CUDA Version: 10.1
 - PyTorch 1.3.1
 - torchvision 0.4.2
 - numpy 1.19.2
 - pillow 8.0.1
 - python 3.6.12
 
Original Dataset:UCF101
or
By the way, I write a matlab code to generate the optical flow images and the RGB images.
- 
For the optical flow images, I call the
Horn–Schunck Algorithmfunction in matlab to calculate it. The video frame interval for calculating the optical flow images is set to2to generate sufficient data. - 
For the RGB images, I just randomly sampled
onesingle frame from each video. 
Generating Data Code (Matlab):calOpticalFlow.m
downloading processed data:Link password:peyu
After downloading processed data, you should unrar the processedData.rar and build a directory named data
Project
│--- data
│------ RGB
│------ OpticalFlow
│--- other files
Before training, you should new a directory named model to save checkpoint file.
python3 trainTwoStreamNet.py
This is a demo video for test. I randomly set the test_video_id = 1000 from testset to run this demo python file. What's more, I use the checkpoint file saved in 9000-th iteration as the demo model.
You can change the test_video_id at here:
# set the test video id in testset
test_video_id = 1000
print('Video Name:', LoadUCF101Data.TestVideoNameList[test_video_id])You can change the checkpoint_file_path at here:
# load the chekpoint file
state = torch.load('model/checkpoint-9000.pth')
twoStreamNet.load_state_dict(state['model'])run demo.py file
CUDA_VISIBLE_DEVICES=0 python3 demo.pyoutput:
Video Name: v_Drumming_g01_c05
actual class is Drumming
predicted class is Drumming , probability is 99.9534
I recorded some problems and solutions when writing the code. Really so sorry that I only write in Chinese! Here is the Link

