0

As far as I understand, stereo matching algorithms should fail predicting the disparity at occluded regions. However, there are plenty learning based methods that produce a fully dense disparity map (just look at the KITTI stereo benchmark list, you need to scroll pretty far down until you see non 100% density). How do these fill in the disparity map corresponding to the occluded regions?

My theory is that in the final few layers, these models learned to predict disparity from monocular cues, but I haven't found any papers that would give an explanation, or even to acknowledge this fact.

  • they fill that in by *making it up*. there's no visual basis for those values. they are not supported by measurements. they're literally made up. -- are you asking how to make up convincing values? or how these networks figure out ways to sell their fabrications to the loss function? – Christoph Rackwitz Apr 28 '23 at 12:35

0 Answers0