2

I have tried to build the Convolutional Pose Machines model from this paper here (https://arxiv.org/pdf/1602.00134.pdf). The model works fine and outputs 15 heatmaps (one per keypoint + 1 for background). From these heatmaps I can calculate the keypoint positions (simply the max value in the heatmap).

My question is: Is this maximum value in the heatmap also equal to the confidence score of the model that the keypoint is in the image?

Maybe this is a dumb question but in the paper the authors don't mention how they calculate the confidence score or how they handle non-visible keypoints.

  • Hi Marc, first of all, thanks for your answer. I'm still not sure if this is the right thing to do (in the case of the model I use). I'm currently testing the model on different datasets and I want to experiment with your solution (sigmoid) and look at the results in more detail to verify it. So your answer was helpful but in the case of the Convolutional Pose Machine model I'm still not sure if that is the right approach, if it is, I will accept your answer. – Johann Gerberding Nov 12 '20 at 11:11
  • well noted. appreciate your feedback. – Marc Nov 12 '20 at 11:22

1 Answers1

0

Best way to answer, I believe, is to dig into the actual code of popular pose estimation models using convolutional approach, to see how this is done in practice.

The Google TensorFlow PoseNet model should be a good example.

What they do in their (open source) code, here (check out the predict method), is to apply a 2D sigmoid activation function to the heatmaps, for each keypoint of the pose.

So, to answer your question, I would say that the maximum value in the heatmap is not directly equal to the confidence score - the output of the sigmoid function is (proper score from 0 to 1)

Marc
  • 2,183
  • 2
  • 11
  • 16