OpenCV - 3D real world coordinates from two perpendicular 2D images

Question

There probably are answers, but I simply did not understand what I found. Maybe it's the language barrier. So I've decided to finally ask. What I need is to find 3D coordinates from two videos recorded by two cameras. The setup is like this:

enter image description here

I can't seem to grasp how to do this. What I have is

Pixel coordinates on both pictures (relative to 0,0 point on the picture)
Focal lengths
distance of both cameras from the 0,0,0 real world point (Ax and By)
size of the pixel
I know the angle between cameras is 90 degrees

What now? OpenCV docs contain this formula: enter image description here

I don't know what 's' is, nor the [R|T] matrix, the extrinsic parameters. I don't know where the principal point is and how to find it (cx, cy) and I can only assume setting it to 0 won't be catastrophic. Also, this looks like it's using only one of the 2D images, not both.

I know of calibrateCamera, solvePnP. and stereoCalibrate functions, but I don't know how to use them.

I know just how complex it gets when you have cameras as two "eyes", I hoped it'd be easier in a situation when the cameras are shooting perpendicular images. I now have a formula to calculate the 3D coordinates, but it's not exactly precise. The error is under 1 inch, but 1 inch too much.

xa, ya, xb, yb - pixel coordinates from pictures
focalAB - focal length
W = -(Ax*xb*pixelSize - focalB*By)/(xa*pixelSize*xb*pixelSize - focalA*focalB)
X = Ax + W*xa*pixelSize
Y = W*focalA
Z = W*xa*pixelSize

Errors:

enter image description here

Those are for focal lengths and pixel size provided by the manafacturer. 5400um and 1,75um. However, the errors are the smallest for the values 4620um and 1,69um, where the biggest one is for 3# X axis, 2,3cm, height errors amost disappear (0,2cm max), and the rest are either 0,1cm or 1-1,5cm.

You should read a book about stereo vision and learn the basic concept first. — Yang Kui, Jul 06 '15 at 09:46
@YangKui I know, unfortunetly I am pressed for time. I can do the math, I just need the explanation of these few points. Mainly what are the extrinsic parameters and how to find the principal point. The problem is I don't know of any literature on this subject in my language, and the English texts are a tough read. I even tried to do the math myself, and I got pretty close (error under 2cm), except for the Z axis, where the error is quite big — Petersaber, Jul 06 '15 at 10:11
*"I can do the math"* - ok, then read the section of [this tutorial](http://users.cecs.anu.edu.au/~hartley/Papers/CVPR99-tutorial/tutorial.pdf) that covers Two View Geometry. The realise that it's not quite so easy, then [buy the book](http://www.amazon.co.uk/Multiple-View-Geometry-Computer-Vision/dp/0521540518), read it and start again ;-). BTW, what *is* your native language? Maybe there is a suitable reference that has been translated that we could recommend. — Roger Rowland, Jul 06 '15 at 10:17
That's what I thought from your profile. Unfortunately I don't think there is a Polish translation of Zisserman and Hartley's book, but anything that covers epipolar geometry would be helpful. There must be other Polish computer vision people reading this - anyone? — Roger Rowland, Jul 06 '15 at 10:47
In order to get the principal point and the extrinsic parameters, you have to calibrate your camera system first. That can be done using the opencv function stereoCalibrate, or use the famous matlab calibration toolbox. Besides, cx cy are definitely not 0s. If the size of your image is [sx, sy], [cx, cy] will be close to [sx / 2, sy / 2]. — Yang Kui, Jul 06 '15 at 10:52
Since you have mentioned the distance of both cameras from 0,0,0 in the real world was known, you can calculate out the translation from A to B accordingly, and, that translation is T. For R, since the angle is 90, R is probably [1, 0, 0 ; 0, 0, 1; 0, -1, 0] or [0, 0, -1; 0, 1, 0; 1 , 0, 0], not for sure. — Yang Kui, Jul 06 '15 at 11:07
@YangKui so, the (cx, cy) point is in relation to top-left corner of the image, and not the center. Good to know. As for the rest, let me process it — Petersaber, Jul 06 '15 at 11:11
@Petersaber I think i meant [cx cy] is close to the center of the image, not the top-left corner. — Yang Kui, Jul 06 '15 at 11:18
@YangKui I know. And the cx, cy of a 1000x1000 picture would be equal (more or less) to (500, 500) as opposed to (0, 0). So the coordinates are "counted" from top-left, as usual, instead of the center (where we'd have a negative half, 0 at the center, and positive half) — Petersaber, Jul 06 '15 at 11:23

score 0 · Answer 1 · answered Jul 06 '15 at 11:15

0

Beyond telling you to read about stereo vision as @YangKui suggested, I can answer some of your sub-questions.

The equation you quote is the (single camera) 3D to 2D projection equation. This is a projective geometry equation (hence the 1s as the last coordinates) and everything is up to some scale s.

s is this scale factor.
R is the 3x3 Rotation of the camera relative to the world/chosen coordinate system.
t is the translation of the camera origin from the world/chosen coordinate system origin.
cx and cy are the principle points in the image - the point of the image plane in pixel units that the Z axis intersects. It is often assumed to be as the center of the image.

answered Jul 06 '15 at 11:15

Adi Shavit

16,743
5
67
137

3D to 2D... so I basicly took a reverse equation to what I want to do? Also, if the camera is perfectly level, will R basicly contain only zeros and ones? And what exactly is a translation of the camera origin? Does that mean the distance from the origin point, (where x y and z intersect on my picture)? A (Ax, 0, 0) matrix? – Petersaber Jul 06 '15 at 11:30
Yes. But you have to dig more deeply into two-camera reconstruction. In this case it is common to choose one camera at the origin and the other in relation to the first. – Adi Shavit Jul 06 '15 at 11:32

score 0 · Answer 2 · answered Jul 06 '15 at 11:52

One approach, which I find provides intuition if not a high-performance implementation, is to construct the camera matrix for both cameras and then use nonlinear optimization to solve for M minimizing "reprojection error".

So come up with the camera matrices: A's camera matrix will map A's camera center in world coordinates to (0, 0, 0) in A's camera coordinates. The rotation part of A's camera matrix will map (0, 1, 0) in world coordinates to (0, 0, 1) in camera coordinates.

Now you can map world coordinates to A and B image coordinates, so for any (x, y, z) you have a corresponding 4-vector: (x_A, y_A, x_B, y_B). If you throw in the point (A_x, B_y, 0), you get out a 4-vector. The difference between that 4-vector and the measured position is your reprojection error. Throw that at a solver and it should quickly converge on an answer.

You might try ''Multiple View Geometry in Computer Vision'' by Hartley and Zisserman.

OpenCV - 3D real world coordinates from two perpendicular 2D images

2 Answers2