Movenet keypoint visualization for video works well frame by frame but gets bad results from saved keypoints

Question

I am working on Movenet_SinglePose_Demo (https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/movenet.ipynb), I use thunder version of model. My problem is that when I visualize with their code: `

# Load the input image.
num_frames, image_height, image_width, _ = image.shape
crop_region = init_crop_region(image_height, image_width)

output_images = []
bar = display(progress(0, num_frames-1), display_id=True)
for frame_idx in range(num_frames):
  keypoints_with_scores = run_inference(
      movenet, image[frame_idx, :, :, :], crop_region,
      crop_size=[input_size, input_size])
  output_images.append(draw_prediction_on_image(
      image[frame_idx, :, :, :].numpy().astype(np.int32),
      keypoints_with_scores, crop_region=None,
      close_figure=True, output_image_height=300))
  crop_region = determine_crop_region(
      keypoints_with_scores, image_height, image_width)
  bar.update(progress(frame_idx, num_frames-1))

# Prepare gif visualization.
output = np.stack(output_images, axis=0)
to_gif(output, fps=10)

` I get different results than when I save tensors with keypoints to list and then in another loop try to visualize them:

`

# Load the input image.
num_frames, image_height, image_width, _ = image.shape
crop_region = init_crop_region(image_height, image_width)

keypoints = []
output_images = []

bar = display(progress(0, num_frames-1), display_id=True)

for frame_idx in range(num_frames):
  keypoints_with_scores = run_inference(
      movenet, image[frame_idx, :, :, :], crop_region,
      crop_size=[input_size, input_size])
  keypoints.append(keypoints_with_scores)

for frame_idx in range(num_frames):
  output_images.append(draw_prediction_on_image(
      image[frame_idx, :, :, :].numpy().astype(np.int32),
      tf.convert_to_tensor(keypoints[frame_idx]), crop_region=None,
      close_figure=True, output_image_height=300))
  crop_region = determine_crop_region(
      tf.convert_to_tensor(keypoints[frame_idx]), image_height, image_width)
  bar.update(progress(frame_idx, num_frames-1))

# Prepare gif visualization.
output = np.stack(output_images, axis=0)
to_gif(output, fps=10)

` I really don't understand why those two give different results. It's not just about keypoints being wobbly on visualization, when I try to save keypoints in one loop and visualize in another loop it gets keypoints completly wrong. How can I visualize results from saved keypoints?

Here is a frame of a video without pre-saved keypoints: and here is the same frame of a video from when I tried to first save keypoints in one loop and then read them in the other loop: As you can see on the second picture we are missing keypoints for legs.

Movenet keypoint visualization for video works well frame by frame but gets bad results from saved keypoints

0 Answers0