OpenCV: 3D Pose estimation of color markers using StereoCamera system

Question

I have a stereo camera system and correctly calibrate it using both, cv::calibrateCamera and cv::stereoCalibrate. My reprojection error seems to be okay:

Cam0: 0.401427
Cam1: 0.388200
Stereo: 0.399642

A snapshot of the ongoing calibration process

I check my calibration by calling cv::stereoRectify and transforming my images using cv::initUndistortRectifyMap and cv::remap. The result is shown below (Something strange I noticed is when displaying the rectified images there are usually artifacts in form of a deformed copy of the original image on one or sometimes even both images):

Rectification

I also correctly estimate the position of my markers in pixel coordinates using cv::findContours on a thresholded HSV image.

enter image description here

Unfortunately, when I now try to cv::triangulatePoints my results are very poor compared to my estimated coordinates, especially in x-direction:

P1 = {   58 (±1),  150 (±1), -90xx (±2xxx)  } (bottom)
P2 = {  115 (±1),  -20 (±1), -90xx (±2xxx)  } (right)
P3 = { 1155 (±6),  575 (±3), 60xxx (±20xxx) } (top-left)

Those are the results in mm in camera coordinates. Both cameras are positioned approximately 550 mm away from the checkerboard and the square size is 13 mm. Apparently, my results are not even close to what I expect (negative and huge z-coordinates).

So my questions are:

I followed the stereo_calib.cpp sample quite closely and I seem to at least visually obtain good result (see reprojection error). Why are my triangulation results so poor?
How can I transform my results to the real-world coordinate system so I can actually check my results quantitatively? Do I have to do it manually as shown over here, or is there some OpenCV functions for that matter?

Here is my code:

std::vector<std::vector<cv::Point2f> > imagePoints[2];
std::vector<std::vector<cv::Point3f> > objectPoints;

imagePoints[0].resize(s->nrFrames);
imagePoints[1].resize(s->nrFrames);
objectPoints.resize( s->nrFrames );

// [Obtain image points..]
// cv::findChessboardCorners, cv::cornerSubPix

// Calc obj points
for( int i = 0; i < s->nrFrames; i++ )
    for( int j = 0; j < s->boardSize.height; j++ )
        for( int k = 0; k < s->boardSize.width; k++ )
            objectPoints[i].push_back( Point3f( j * s->squareSize, k * s->squareSize, 0 ) );


// Mono calibration
cv::Mat cameraMatrix[2], distCoeffs[2];
cameraMatrix[0] = cv::Mat::eye(3, 3, CV_64F);
cameraMatrix[1] = cv::Mat::eye(3, 3, CV_64F);

std::vector<cv::Mat> tmp0, tmp1;

double err0 = cv::calibrateCamera( objectPoints, imagePoints[0], cv::Size( 656, 492 ),
    cameraMatrix[0], distCoeffs[0], tmp0, tmp1,    
    CV_CALIB_FIX_PRINCIPAL_POINT + CV_CALIB_FIX_ASPECT_RATIO );
std::cout << "Cam0 reproj err: " << err0 << std::endl;

double err1 = cv::calibrateCamera( objectPoints, imagePoints[1], cv::Size( 656, 492 ),
    cameraMatrix[1], distCoeffs[1], tmp0, tmp1,
    CV_CALIB_FIX_PRINCIPAL_POINT + CV_CALIB_FIX_ASPECT_RATIO );
std::cout << "Cam1 reproj err: " << err1 << std::endl;

// Stereo calibration
cv::Mat R, T, E, F;

double err2 = cv::stereoCalibrate(objectPoints, imagePoints[0], imagePoints[1],
    cameraMatrix[0], distCoeffs[0], cameraMatrix[1], distCoeffs[1],
    cv::Size( 656, 492 ), R, T, E, F,
    cv::TermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, 1e-5),
    CV_CALIB_USE_INTRINSIC_GUESS + // because of mono calibration
    CV_CALIB_SAME_FOCAL_LENGTH +
    CV_CALIB_RATIONAL_MODEL +
    CV_CALIB_FIX_K3 + CV_CALIB_FIX_K4 + CV_CALIB_FIX_K5);
std::cout << "Stereo reproj err: " << err2 << std::endl;

// StereoRectify
cv::Mat R0, R1, P0, P1, Q;
Rect validRoi[2];
cv::stereoRectify( cameraMatrix[0], distCoeffs[0],
            cameraMatrix[1], distCoeffs[1],
            cv::Size( 656, 492 ), R, T, R0, R1, P0, P1, Q,
            CALIB_ZERO_DISPARITY, 1, cv::Size( 656, 492 ), &validRoi[0], &validRoi[1]);


// [Track marker..]
// cv::cvtColor, cv::inRange, cv::morphologyEx, cv::findContours, cv::fitEllipse, *calc ellipsoid centers*

// Triangulation
unsigned int N = centers[0].size();


cv::Mat pnts3D;
cv::triangulatePoints( P0, P1, centers[0], centers[1], pnts3D );


cv::Mat t = pnts3D.t();
cv::Mat pnts3DT = cv::Mat( N, 1, CV_32FC4, t.data );

cv::Mat resultPoints; 
cv::convertPointsFromHomogeneous( pnts3DT, resultPoints );

Finally, resultPoints is supposed to contain column vectors of my 3D positions in camera coordinates.

Edit: I removed some unnecessary conversions to shorten the code

Edit2: The results I get using @marols suggested solution for triangulation

Qualitative and quantitative triangulation results

P1 = { 111,  47, 526 } (bottom-right)
P2 = {  -2,   2, 577 } (left)
P3 = {  64, -46, 616 } (top)

score 5 · Accepted Answer · answered Jul 01 '14 at 20:28

5

My answer will focus on suggesting another solution to triangulatePoints. In case of stereo vision, you can use matrix Q returned by stereo rectification in following way:

std::vector<cv::Vec3f> surfacePoints, realSurfacePoints;

unsigned int N = centers[0].size();
for(int i=0;i<N;i++) {
    double d, disparity;
    // since you have stereo vision system in which cameras lays next to 
    // each other on OX axis, disparity is measured along OX axis
    d = T.at<double>(0,0);
    disparity = centers[0][i].x - centers[1][i].x;

    surfacePoints.push_back(cv::Vec3f(centers[0][i].x, centers[0][i].y, disparity));
}

cv::perspectiveTransform(surfacePoints, realSurfacePoints, Q);

Please adapt following snippet to your code, I might made some mistakes, but the point is to create an array of cv::Vec3f's, each of them having following structure: (point.x, point.y, disparity between point on second image) and pass it to the perspectiveTransform method (see docs for more details). If you would like to get more into details of how matrix Q is created (basically it represents a "reverse" projection from an image to real world point) see "Learning OpenCV" book, page 435.

In the stereo vision system I have developed, described method works fine and gives proper results on even bigger calibration errors (like 1.2).

answered Jul 01 '14 at 20:28

marol

4,034
3
22
31

Thanks, your solution works like a charm! Still, I would love to see some ideas regarding my second question. Do you have any idea on how to easily convert to real world coordinates? – pdresselhaus Jul 02 '14 at 14:58
@Random-I-Am the values returned by snippet I've provided return points in the world coordinates. From what I know, cv::triangulatePoints() returns points in a homogeneous coordinates (x,y,z,w), so only what you have to do is to divide all coordinates by w making point (x/w, y/w, z/w) in world coordinates. – marol Jul 02 '14 at 20:34
Now that probably depends on the definition of `world coordinates`. I've updated my question and supplied the results I get using your suggested solution. The values I get are in `camera coordinates` of the left camera (even though I do not understand why `y` would decrease when shifting a point to the top of the image). I want to transform these coordinates to `object coordinates`. E.g. when placing a marker on top of the top-left feature of my chessboard the result I expect is P = { 0,0,0 } – pdresselhaus Jul 03 '14 at 08:04
Since you have a checkerboard, you can estimate its pose quite easily. You just have to detect your checkerboard, and then find 3D positions of its 4 corners. From this and some simple geometry, you can easily define a frame attached to your checkerboard top-left corner and compare your markers 3D positions to this in the checkerboards plane. The only problem that you will face with this is that it is ambiguous, because your checkerboard has 2 axes of symmetry. – Ben Jul 08 '14 at 13:25
@Random-I-Am: I would say Ben's comment is quite right. So using my answer you can find 3D positions and then using simple math to do the rest. And I think this is the answer for your second question. – marol Jul 08 '14 at 13:27

score 0 · Answer 2 · answered Jul 10 '14 at 13:53

To project into real world coordinates system, you need the Projection camera matrix. This can be done as:

cv::Mat KR = CalibMatrix * R;
cv::Mat eyeC = cv::Mat::eye(3,4,CV_64F);
eyeC.at<double>(0,3) = -T.at<double>(0);
eyeC.at<double>(1,3) = -T.at<double>(1);
eyeC.at<double>(2,3) = -T.at<double>(2);

CameraMatrix = cv::Mat(3,4,CV_64F);
CameraMatrix.at<double>(0,0) = KR.at<double>(0,0) * eyeC.at<double>(0,0) + KR.at<double>(0,1) * eyeC.at<double>(1,0) + KR.at<double>(0,2) * eyeC.at<double>(2,0);
CameraMatrix.at<double>(0,1) = KR.at<double>(0,0) * eyeC.at<double>(0,1) + KR.at<double>(0,1) * eyeC.at<double>(1,1) + KR.at<double>(0,2) * eyeC.at<double>(2,1);
CameraMatrix.at<double>(0,2) = KR.at<double>(0,0) * eyeC.at<double>(0,2) + KR.at<double>(0,1) * eyeC.at<double>(1,2) + KR.at<double>(0,2) * eyeC.at<double>(2,2);
CameraMatrix.at<double>(0,3) = KR.at<double>(0,0) * eyeC.at<double>(0,3) + KR.at<double>(0,1) * eyeC.at<double>(1,3) + KR.at<double>(0,2) * eyeC.at<double>(2,3);
CameraMatrix.at<double>(1,0) = KR.at<double>(1,0) * eyeC.at<double>(0,0) + KR.at<double>(1,1) * eyeC.at<double>(1,0) + KR.at<double>(1,2) * eyeC.at<double>(2,0);
CameraMatrix.at<double>(1,1) = KR.at<double>(1,0) * eyeC.at<double>(0,1) + KR.at<double>(1,1) * eyeC.at<double>(1,1) + KR.at<double>(1,2) * eyeC.at<double>(2,1);
CameraMatrix.at<double>(1,2) = KR.at<double>(1,0) * eyeC.at<double>(0,2) + KR.at<double>(1,1) * eyeC.at<double>(1,2) + KR.at<double>(1,2) * eyeC.at<double>(2,2);
CameraMatrix.at<double>(1,3) = KR.at<double>(1,0) * eyeC.at<double>(0,3) + KR.at<double>(1,1) * eyeC.at<double>(1,3) + KR.at<double>(1,2) * eyeC.at<double>(2,3);
CameraMatrix.at<double>(2,0) = KR.at<double>(2,0) * eyeC.at<double>(0,0) + KR.at<double>(2,1) * eyeC.at<double>(1,0) + KR.at<double>(2,2) * eyeC.at<double>(2,0);
CameraMatrix.at<double>(2,1) = KR.at<double>(2,0) * eyeC.at<double>(0,1) + KR.at<double>(2,1) * eyeC.at<double>(1,1) + KR.at<double>(2,2) * eyeC.at<double>(2,1);
CameraMatrix.at<double>(2,2) = KR.at<double>(2,0) * eyeC.at<double>(0,2) + KR.at<double>(2,1) * eyeC.at<double>(1,2) + KR.at<double>(2,2) * eyeC.at<double>(2,2);
CameraMatrix.at<double>(2,3) = KR.at<double>(2,0) * eyeC.at<double>(0,3) + KR.at<double>(2,1) * eyeC.at<double>(1,3) + KR.at<double>(2,2) * eyeC.at<double>(2,3);

OpenCV: 3D Pose estimation of color markers using StereoCamera system

2 Answers2

Linked