This repository is not maintained anymore and it will eventually be closed. Please, move to OpenPose!
C++ code repo for the ECCV 2016 demo, "Realtime Multiperson Pose Estimation", Zhe Cao, Shih-En Wei, Tomas Simon, Yaser Sheikh. Thanks Ginés Hidalgo Martínez for restructuring the code.
The full project repo includes matlab and python version, and training code.
This project is under the terms of the license.
- Required: CUDA & cuDNN installed on your machine.
- If you have installed OpenCV 2.4 in your system, go to step 3. If you are using OpenCV 3, uncomment the line
# OPENCV_VERSION := 3on the fileMakefile.config.Ubuntu14.example(for Ubuntu 14) and/orMakefile.config.Ubuntu16.example(for Ubuntu 15 or 16). In addition, OpenCV 3 does not incorporate theopencv_contribmodule by default. Assuming you have manually installed it and you need to use it, appendopencv_contribat the end of the lineLIBRARIES += opencv_core opencv_highgui opencv_imgprocin theMakefilefile. - Build
caffe&rtpose.bin+ download the required caffe models (script tested on Ubuntu 14.04 & 16.04, it uses all the available cores in your machine):**
chmod u+x install_caffe_and_cpm.sh
./install_caffe_and_cpm.sh
./build/examples/rtpose/rtpose.bin --video video_file.mp4
./build/examples/rtpose/rtpose.bin
--help <--- It displays all the available options.
--video input.mp4 <--- Input video. If omitted, will use webcam.
--camera # <--- Choose webcam number (default: 0).
--image_dir path_to_images/ <--- Run on all jpg, png, or bmp images in path_to_images/. If omitted, will use webcam.
--write_frames path/ <--- Render images with this prefix: path/frame%06d.jpg
--write_json path/ <--- Output JSON file with joints with this prefix: path/frame%06d.json
--no_frame_drops <--- Don't drop frames. Important for making offline results.
--no_display <--- Don't open a display window. Useful if there's no X server.
--num_gpu 4 <--- Parallelize over this number of GPUs. Default is 1.
--num_scales 3 --scale_gap 0.15 <--- Use 3 scales, 1, (1-0.15), (1-0.15*2). Default is one scale=1.
(HD)
--net_resolution 656x368 --resolution 1280x720 (These are the default values.)
(VGA)
--net_resolution 496x368 --resolution 640x480
--logtostderr <--- Log messages to standard error.
Run on a video vid.mp4, render image frames as output/frame%06d.jpg and output JSON files as output/frame%06d.json, using 3 scales (1.00, 0.85, and 0.70), parallelized over 2 GPUs:
./build/examples/rtpose/rtpose.bin --video vid.mp4 --num_gpu 2 --no_frame_drops --write_frames output/ --write_json output/ --num_scales 3 --scale_gap 0.15
Each JSON file has a bodies array of objects, where each object has an array joints containing the joint locations and detection confidence formatted as x1,y1,c1,x2,y2,c2,..., where c is the confidence in [0,1].
{
"version":0.1,
"bodies":[
{"joints":[1114.15,160.396,0.846207,...]},
{"joints":[...]},
]
}
where the joint order of the COCO parts is: (see src/rtpose/modelDescriptorFactory.cpp )
part2name {
{0, "Nose"},
{1, "Neck"},
{2, "RShoulder"},
{3, "RElbow"},
{4, "RWrist"},
{5, "LShoulder"},
{6, "LElbow"},
{7, "LWrist"},
{8, "RHip"},
{9, "RKnee"},
{10, "RAnkle"},
{11, "LHip"},
{12, "LKnee"},
{13, "LAnkle"},
{14, "REye"},
{15, "LEye"},
{16, "REar"},
{17, "LEar"},
{18, "Bkg"},
}
We modified and added several Caffe files in include/caffe and src/caffe. In case you want to use your own Caffe distribution, these are the files we added and modified:
- Added folders in
include/caffeandsrc/caffe:include/caffe/cpmandsrc/caffe/cpm. - Modified files in
include/caffe(search for// CPM extra code:to find the modified code):data_transformer.hpp. - Modified files in
src/caffe(search for// CPM extra code:to find the modified code):data_transformer.cpp,proto/caffe.protoandutil/blocking_queue.cpp. - Replaced files:
README.md. - Added files:
install_caffe_and_cpm.sh,Makefile.config.Ubuntu14.example(extracted fromMakefile.config.example) andMakefile.config.Ubuntu16.example(extracted fromMakefile.config.example). - Other added folders:
model/,examples/rtpose,/include/rtposeand/src/rtpose. - Other modified files:
Makefile. - Optional - deleted Caffe files and folders (only to save space):
Makefile.config.example,data/,examples/(do not deleteexamples/rtpose) andmodels/.
We created a few Caffe layers (located in include/caffe/cpm/layers and src/caffe/cpm/layers):
- ImResizeLayer: Only used for testing (backward pass not implemented). This layer performs 2-D resize over the 4-D data. I.e., given a 4-D input of size (
numxchannelsxheight_inputxwidth_input), the layer returns a 4-D output of size (numxchannelsxheight_outputxwidth_output). It is independently applied to each dimension ofnumandchannels. Its parameters are:factor: Scaling factor with respect to the input width and height.factoris the alternative to the pair of variables [target_spatial_width,target_spatial_height]. Iffactor != 0, the latter are ignored.scale_gapandstart_scale: These parameters are related and used for doing scale search in testing mode. Ifstart_scale = 1(default), the CNN input patch size is the net resolution (set with--net_resolution).scale_gapis used to calculate the scale difference between scales. This parameters are related with the flag--num_scales. For instance, using--start_scale 1 --num_scales 3 --scale_gap 0.1means using 3 scales: 1, 1-0.1, 1-2*0.1, hence the different patch sizes correspond to the net resolution multiplied by these scales values.target_spatial_height: Alternative tofactor. It sets the output height. Ignored iffactor != 0.target_spatial_width: Alternative tofactor. It sets the output width. Ignored iffactor != 0.
- NmsLayer: Only used for testing (backward pass not implemented). This layer performs 3-D Non-Maximum Suppression over the 4-D data. I.e., given a 4-D input of size (
numxchannelsxheightxwidth), it returns a 4-D output of size (numxnum_partsxmax_peaks+1x3). It is independently applied to each dimension ofnum. The seconds dimension corresponds to the number of limbs (num_parts). The third dimension indicates the maximum number of peaks to be analyzed (max_peaks+1). Finally, the last one corresponds to thex,yandscorevalues (3). Its parameters are:max_peaks: The number of peaks to be considered. The lasttotal_peaks-max_peakspeaks are discarded.num_parts: The number of limbs to detect (e.g. 15 for MPI and 18 for COCO).threshold: Any input value smaller than this threshold is set to 0.
Please cite the paper in your publications if it helps your research:
@article{cao2016realtime,
title={Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
author={Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
journal={arXiv preprint arXiv:1611.08050},
year={2016}
}
@inproceedings{wei2016cpm,
author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
booktitle = {CVPR},
title = {Convolutional pose machines},
year = {2016}
}