For understanding purposes, I want to implement a stereo algorithm in Python (and Numpy), that computes a disparity map. As image data, I used the Tsukuba image dataset from Middlebury*. For simplicity, I choose normalised cross correlation (NCC)** as the similarity measure to find correspondence pixels. I will assume scanline agreement. Here my implemented NCC:
left_mu = np.mean(left_patch)
right_mu = np.mean(right_patch)
left_sigma = np.sqrt(np.mean((left_patch - left_mu)**2))
right_sigma = np.sqrt(np.mean((right_patch - right_mu)**2))
patch = left_patch * right_patch
mu = left_mu * right_mu
num = np.mean(patch) - mu
denom = left_sigma * right_sigma
ncc = num/denom
where the left_patch
and right_patch
are some 3x3 patches from the original images. This outputs integers between -1 and 1, which describes the similarity between two pixels.
The idea is now to find the best-fit pixel. The disparity between the two pixels should now be stored in a new image - the disparity map.
Since I assumed scanline agreement I only have to search in one image row. For each pixel in the row, I want to take the index of the value that maximises the NCC value and store it as the disparity value.
My problem is now, that my results are rather odd. My disparity values are at around 180-200 pixels for an image which is 384x288 pixels. Here the resulting image.
Can you see the mistake in my thinking?
(*) vision.middlebury.edu/stereo/data/scenes2001/data/anigif/orig/tsukuba_o_a.gif
(**) A two-stage correlation method for stereoscopic depth estimation. - N. Einecke and J. Eggert