How to Recognize Gestures and Actions
We use the MPII model to recognize human gestures and actions. The MPII model was developed using a specific CNN architecture called VGG. The structure of VGG is:
As shown in the image above:
- The white boxes : Convolutional layers.
- The red boxes : Pooling layers.
- Other boxes : Detection layers. The output layer will represent the number of objects. That is, if we use imagenet, the number of output layer channels will be 1000.
However, since we use MPII which detects human points, the number of output layer units will be 15. Like:
Implementation
- recognize gestures and actions by Caffe
I used Caffe framework to implement it.