-1

I am implementing a keypoint detection algorithm to recognize biomedical landmarks on images. I only have one type of landmark to detect. But in a single image, 1-10 of these landmarks can be present. I'm wondering what's the best way to organize the ground truth to maximize learning.

I considered creating 10 landmark coordinates per image and associate them with flags that are either 0 (not present) or 1 (present). But this doesn't seem ideal. Since the multiple landmarks in a single picture are actually the same type of biomedical element, the neural network shouldn't be trying to learn them as separate entities.

Any suggestions?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Not a *programming* question, hence off-topic here; please see the NOTE in https://stackoverflow.com/tags/deep-learning/info – desertnaut Oct 12 '22 at 14:15

1 Answers1

0

One landmark that can appear everywhere sounds like a typical CNN problem. Your CNN filters should learn which features make up the landmark, but they don't care where it appears. That would be the responsibility of the next layers. Hence, for training the CNN layers you can use a monochrome image as the target: 1 is "landmark at this pixel", 0 if not.

The next layers are basically processing the CNN-detected features. To train those, your ground truth should be basically the desired outcome. Do you just need a binary output (count>0)? A somewhat accurate estimate of the count? Coordinates? Orientation? NN's don't care that much what they learn, so just give it in training what it should produce in inference.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Thank you for the reply. To answer your questions: Yes, I am implementing convolution blocks for the NN to learn the visual patterns. What I need as an output are the coordinates of these landmark locations in the image. If I was dealing with a fixed number or landmarks (i.e. 3), I would simply store the position of these 3 landmarks and use it for error. Since the number varies, I'm not sure how to organize the ground truth. – Frederico Severgnini Oct 11 '22 at 16:08
  • @FredericoSevergnini: I'm not sure that method would work well; it sounds like that has the risk that 2 of your outputs could report the same landmark. After all, they're both right. But what's really the problem with just using the black&white ground truth image? Counting blobs in the NN output is fairly trivial, that's classic Computer Vision. – MSalters Oct 11 '22 at 16:15
  • Yes, I agree with your observation. There would be nothing stopping the system from reporting the same landmark for 2 outputs. The reason I am trying to adopt keypoint detection here is because I need the detection to happen with pixel accuracy. I will later use this output to calculate the distance between these landmarks. Generating a blob around the area of interest wouldn't give me the precision I am looking for. – Frederico Severgnini Oct 11 '22 at 16:37