Nature of relationship between optic flow and depth

Question

Assuming the static scene, with a single camera moving exactly sideways at small distance, there are two frames and a following computed optic flow (I use opencv's calcOpticalFlowFarneback):

Here scatter points are detected features, which are painted in pseudocolor with depth values (red is little depth, close to the camera, blue is more distant). Now, I obtain those depth values by simply inverting optic flow magnitude, like d = 1 / flow. Seems kinda intuitive, in a motion-parallax-way - the brighter the object, the closer it is to the observer. So there's a cube, exposing a frontal edge and a bit of a side edge to the camera.

But then I'm trying to project those feature points from camera plane to the real-life coordinates to make a kind of top view map (where X = (x * d) / f and Y = d (where d is depth, x is pixel coordinate, f is focal length, and X and Y are real-life coordinates). And here's what I get:

Well, doesn't look cubic to me. Looks like the picture is skewed to the right. I've spent some time thinking about why, and it seems that 1 / flow is not an accurate depth metric. Playing with different values, say, if I use 1 / power(flow, 1 / 3), I get a better picture:

But, of course, power of 1 / 3 is just a magic number out of my head. The question is, what is the relationship between optic flow in depth in general, and how do I suppose to estimate it for a given scene? We're just considering camera translation here. I've stumbled upon some papers, but no luck trying to find a general equation yet. Some, like that one, propose a variation of 1 / flow, which isn't going to work, I guess.

Update

What bothers me a little is that simple geometry points me to 1 / flow answer too. Like, optic flow is the same (in my case) as disparity, right? Then using this formula I get d = Bf / (x2 - x1), where B is distance between two camera positions, f is focal length, x2-x1 is precisely the optic flow. Focal length is a constant, and B is constant for any two given frames, so that leaves me with 1 / flow again multiplied by a constant. Do I misunderstand something about what optic flow is?

What about using artificial data (openGL rendering) to get both: depth values and flow. By analyzing it you might be able to find a formula or not. — Micka, Dec 14 '14 at 02:25
That's a great idea and I'm going to try it - but for now I'm a bit concerned about the fact that there already seems to be a formula (see the update, please) - and it doesn't work. — rocknrollnerd, Dec 14 '14 at 08:37
As far as I understand you want ro use something like stereo imaging by moving a single camera just a little in a known direction. Im no pro but afaik Disparity is the shift from the "predicted" position of the pixels. In stereo the cameras are calibrated which means you can computed the pixels positions in the other image, but this computed position is only true for 3D points that lie on the "horopter". so if the 3D point isnt on the horopter the pixel coordinate differs from projection and this difference is disparity. Afaik. — Micka, Dec 14 '14 at 11:48
After looking at your link, maybe Im wrong... maybe you have just a scaling factor wrong. Do you use camera intrinsics? And check whether the formula only holds for rectified images! — Micka, Dec 14 '14 at 11:50
Maybe have a look at "Optic Flow Goes Stereo: A Variational Method for Estimating Discontinuity-Preserving Dense Disparity Maps by Natalia Slesareva, Andr´es Bruhn, and Joachim Weickert" — Micka, Dec 14 '14 at 15:15
Turns out, my focal length parameter was incorrect. It was too small, hence bigger depth values appeared far away from the central axis. Another problem is that I still have to convert my relative-depth map into absolute (pixel) values, otherwise part of the skewness remains (like in the second scatter plot, points corresponding to the cube edges look fine, but some upper points are still skewed to the right. But at least, I see the light now. :-) Thanks for your comments! — rocknrollnerd, Dec 14 '14 at 15:43

score 3 · Answer 1 · answered Jul 03 '18 at 06:29

for a static scene, moving a camera precisely sideways a known amount, is exactly the same as a stereo camera setup. From this, you can indeed estimate depth, if your system is calibrated.

Note that calibration in this sense is rather broad. In order to get real accurate depth, you will need to in the end supply a scale parameter on top of the regular calibration stuff you have in openCV, or else there is a single uniform ambiguity of the 3D (This last step is often called going to the "metric" reconstruction from only the "Euclidean").

Another thing which is apart of broad calibration is lens distortion compensation. Before anything else, you probably want to force your cameras to behave like pin-hole cameras (which real-world cameras usually dont).

With that said, optical flow is definetely very different from a metric depth map. If you properly calibraty and rectify your system first, then optical flow is still not equivalent to disparity estimation. If your system is rectified, there is no point in doing a full optical flow estimation (such as Farnebäck), because the problem is thereafter constrained along the horizontal lines of the image. Doing a full optical flow estimation (giving 2 d.o.f) will introduce more error after said rectification likely.

A great reference for all this stuff is the classic "Multiple View Geometry in Computer Vision"

Nature of relationship between optic flow and depth

1 Answers1