As far as I understand, stereo matching algorithms should fail predicting the disparity at occluded regions. However, there are plenty learning based methods that produce a fully dense disparity map (just look at the KITTI stereo benchmark list, you need to scroll pretty far down until you see non 100% density). How do these fill in the disparity map corresponding to the occluded regions?
My theory is that in the final few layers, these models learned to predict disparity from monocular cues, but I haven't found any papers that would give an explanation, or even to acknowledge this fact.