As I continue my research in deep learning and computer vision, my goal is to learn more about the limitations of what is currently one of the most popular methods for estimating human poses – CMU’s OpenPose system.
As shown in the video above, this method can produce impressive results when the entire body of the subject is visible to the camera .
However, at various points in this second video, the system is unable to detect the subject’s left leg.
My question is, are there other factors that can trick the system?
To answer this question, I wrote a script that allowed me to process various images using this PyTorch implementation of OpenPose as a back-end.
Here are few of some of the interesting results that I observed:
For my first expirement, I staged a photo in order to test whether occlusion limits the ability of the system to detect multiple subjects. (It does)
Here, Aaron is holding his right hand behind his back, and Alexandra is standing with her left foot behind Aaron’s leg and her left hand in front of his torso.
OpenPose identifies Alexandra’s left ankle in the wrong place. It should be behind Aaron’s right leg.
The biggest surprise here for me is that Openpose does not appear to have a one-to-one relationship between joints: It identifies that both Alexandra’s left elbow and Aaron’s right elbow are connected to Alexandra’s left wrist.
What about results that are not staged to trick the system?
This photo of a video shoot at our lab has eighit people who are all standing close to another person in the shoot.
Below, we can observe a few instances of the one-to-many joint sharing issue identified above.
The two men on the left are sharing an elbow, while the two men on the right are sharing a wrist, and the man who is interviewing Ken is sharing his ears with the painting of Ayara on the poster behind him.