I am familiar with two view stereo but fuzzy on how SFM (Structure from motion) and MVS (Multiview stereo) exactly works.
Let's say I have two stereo pairs of cameras, (A, B)
and (C, D)
. I can calculate the depth map for camera A
using two-view stereo with cameras A
and B
. Similarly, I can calculate the depth map for camera C
using two-view stereo with cameras C
and D
. Based on calibration, I can turn depth map A
into point clouds and color them with color values from camera A
. Similarly, I can turn depth map C
into point clouds and color them with color values from camera C
. In a perfect world, when I overlay point clouds A
and point clouds C
, it should look perfect without any obvious color problems, but unfortunately, in the real world, there will be some color difference between what camera A
and camera C
captures for the same point in space. I tried various ways of color averaging for point clouds that are visible in both camera A
and camera C
but no matter what there will be an obvious color "seam" between point clouds that are only visible in camera A
and point clouds that's visible in both camera A
and camera C
.
However, this kind of color problem doesn't seem to exist in SFM and MVS. As shown in the results of colmap, AliceVision and RealityCapture. I've read multiple tutorials on how SFM/MVS works but none of them specifically explained how it overcomes the color problem. Most of them focused on explaining how to generate depth, and for the case of SFM estimating the intrinsics and pose. Can someone explain to me what method does conventional SFM/MVS uses to solve the color difference? I would appreciate a link to a tutorial/paper that explains this as well.