Posenet can be used either to detect single poses or multiple poses of the subjects present in the image, the single-pose detector being faster and simpler than its multiple-pose counterpart. For the sake of simplicity, the working demo is configured to single-pose detection problem. At a high level, pose estimation consists of 2 parts:
A poser is a subject of the image that has a humanoid pose- humans and human-like figures can be considered as posers. To detect what the posture the subjects of the images have, the subjects themselves have to be identified first. A confidence score of 0.0 to 1.0 can be used as a threshold to set apart poses that are not deemed strong enough.What is a key-point?
Markers in a human’s body that can be used to determine its posture configurations constitute of the key-points. Posenet detects 17 such key points- nose, left-eye, right-eye, left-ear, right-ear, left shoulder, right-shoulder... etc. Posenet makes an estimation in 2 dimensions, therefore only the x and y coordinates are returned. Also, a confidence score between 0.0 and 1.0 can be employed here also.
From TomTom’s Spark to Jabra’s Elite Sport, to all manner of Fitbits and Garmin wearables, there are already gadgets that attempt to tell you how fit you are. With AI this fitness gauging should become way more sophisticated, flexible and useful while at work in terms of triggering individuals to maintain the right body postures throughout.
In sports, AI can be used for the evaluation of performed exercises during training or analysis of on field body movements. These parameters were applied for the development of intelligent optimised training methods, allowing an automatic assessment of the exercise technique, investigation of the quality of the execution and providing athletes and coaches with appropriate feedback.
This module must be used when only one human or human-like figure forms the subject of the image. Following are the inputs of the detector for single pose detection:
At a high-level, the posenet looks to generate a heatmap that gives probabilities or confidence scores of the presence of key-points in various regions of the image. When posenet processes an image, a heatmap along with a bunch of offset vectors is used to decode the areas which have a high-confidence in detecting key-points. The heatmap is a 3D tensor having a resolution given by the formula: Resolution = ((Input_size - 1)/output_stride) + 1
The offset vector is a 3D tensor of dimension (resolution x resolution x 34), the depth of the tensor being twice the number of key-points. Since the heatmap is an approximation of where the key-points are, the offset vectors corresponding to the key-points given by the heatmap are used to offset the prediction to the exact location in the image.Heatmap and Offset Vector Simplification
Visualization of heatmap and offset-vector tensors. The depth of the tensor is 17, since 17 key-points are to be located. Thus, the dimension of the tensor (resolution x resolution x 17). It uses the MobileNet architecture to carry out this task; the image below shows the trade-off relation between speed and accuracy.Output Stride and Heatmap Resolution
So once the heatmap is generated, how do we decode it to get the poses?
Build this in Big Brain in just 4 steps within 30 minutes: