How does SFM/MVS overcome the color difference between different camera?

Question

I am familiar with two view stereo but fuzzy on how SFM (Structure from motion) and MVS (Multiview stereo) exactly works.

Let's say I have two stereo pairs of cameras, (A, B) and (C, D). I can calculate the depth map for camera A using two-view stereo with cameras A and B. Similarly, I can calculate the depth map for camera C using two-view stereo with cameras C and D. Based on calibration, I can turn depth map A into point clouds and color them with color values from camera A. Similarly, I can turn depth map C into point clouds and color them with color values from camera C. In a perfect world, when I overlay point clouds A and point clouds C, it should look perfect without any obvious color problems, but unfortunately, in the real world, there will be some color difference between what camera A and camera C captures for the same point in space. I tried various ways of color averaging for point clouds that are visible in both camera A and camera C but no matter what there will be an obvious color "seam" between point clouds that are only visible in camera A and point clouds that's visible in both camera A and camera C.

However, this kind of color problem doesn't seem to exist in SFM and MVS. As shown in the results of colmap, AliceVision and RealityCapture. I've read multiple tutorials on how SFM/MVS works but none of them specifically explained how it overcomes the color problem. Most of them focused on explaining how to generate depth, and for the case of SFM estimating the intrinsics and pose. Can someone explain to me what method does conventional SFM/MVS uses to solve the color difference? I would appreciate a link to a tutorial/paper that explains this as well.

Can you show some examples of what you obtain? – Ash Jan 29 '20 at 12:45 — Ash, Jan 29 '20 at 12:45

score 0 · Answer 1 · answered Feb 06 '20 at 09:40

This problem needs to be explained in two different scenarios.

Inaccurate SfM: one source of error regarding mismatches in color information between different views is often small errors in the computed camera poses. This is especially true if the mismatch appears in consecutive views, as the illumination in the real world most probably didn't have time to change much in the interval taking the images. The pose errors affect not only the coloring of the point cloud, but most importantly the depth-map estimation, which in turn amplifies the error in computing the point color due to inaccuracy in the pixel depth (which ends projecting in a wrong place in the other image). Same effect if SfM is accurate, but the depth-map estimation algorithm does a poor job.
Illumination changes: the light might differ between two views of the same scene for many reasons: light source position change, camera exposure changes, atmospheric/environmental changes, etc. There are several ways to deal with it depending on the stage, like SfM or MVS. For example in SfM it is a problem in feature matching, and a feature extractor to be robust to illumination changes most of the time uses a descriptor based on some form of gradients in color space, which reduces the effect. In MVS there are several stages that rely on matching colors between views, but the most important one is depth-map estimation (or any other form of dense matching). This is solved by using a cost metric robust to illumination changes; a popular example is Zero Normalized Cross Correlation (ZNCC), and improved version of NCC that addresses exactly this problem.

Going back to your problem to solve, assuming all the above worked fine for you, in order to obtain a nice looking color for your point cloud, there are two popular solution: 1) averaging the color from all the views of the point, or 2) select only the "best" view per point. Obviously the problem in 1 is that the resulted color will be blurred, and for 2 a way to select the view per point is very important to minimize the transition between different views (and there are many ways to do this, but obviously a global approach would be the best).

I understand the depth/pose accuracy problem but at least I can easily find papers describing different ways to solve it. But for this color problem, even if we have the perfect depth it will still be there due to the "Illumination changes" you referred to. Averaging would not work. In my example, you can average the overlaying points however you want but the color won't look right at the boundaries between overlaying/non-overlaying points. There will be the same problem with the "best" viewpoint method too. Do you know how SFM/MVS solves the color problem? — user3667089, Feb 10 '20 at 19:04

How does SFM/MVS overcome the color difference between different camera?

1 Answers1