1

Multi-camera calibration with overlapping views is straightforward. However, I want to calibrate a multi-camera rig without overlapping views, which is back-to-back. I looked at some resources and found that opencv's hand-eye calibration seems to solve my problem.

To verify my ideas, I assume a simple case where I use hand-eye calibration to calibrate a stereo camera setting. In my setup, two cameras construct a rig and face the same aruco board. My goal is to find the extrinsic between the two cameras.

Based on OpenCV documentation, I created the input as follows:

pripper -> left camera; base -> aruco board; target -> same board; cam -> right camera; board2leftCam or board2rightCam can be calculated using aruco::estimatePoseBoard()

Procedure:

R_gripper2base -> board2leftCam.inv();
R_target2cam -> board2rightCam.
R_cam2gripper -> right2left. 

the output should be cam2gripper -> right camera to left camera. but the result is very different from stereo calibration.

Why is my calibration result wrong? Thank you in advance!

My Code:

/*
* R_lefCam2Board  => R_gripper2base
*/
std::vector<cv::Mat> rvecMat_lefCam2Board, tvecMat_lefCam2Board;
for (size_t i = 0; i < lefCamfilenames.size(); ++i) {
    cv::Mat matLef = cv::imread(lefCamfilenames[i]);
    vector<int> ids;
    vector<vector<Point2f>> corners, rejected;
    aruco::detectMarkers(matLef, dictionary, corners, ids, params, rejected);
    cv::Vec3d rvec, tvec; // from floor 2 bot camera
    aruco::estimatePoseBoard(corners, ids, board, botCam.mtx, botCam.dist, rvec, tvec);
    cv::Mat pose = vec3d2Mat44(rvec, tvec); // pose is double
    cv::Mat pose_inv = pose.inv();
    pose_inv.convertTo(pose_inv, CV_32F);

    cv::Mat R_lefCam2Board, T_lefCam2Board;
    mat44ToRT_(pose_inv, R_lefCam2Board, T_lefCam2Board);
    rvecMat_lefCam2Board.push_back(R_lefCam2Board);
    tvecMat_lefCam2Board.push_back(T_lefCam2Board);
}
/*
 * R_Board2botCam  => R_target2cam
 */
std::vector<cv::Mat> rvecMat_board2botCam, tvecMat_board2botCam;
for (size_t i = 0; i < botCamfilenames.size(); ++i) {
    cv::Mat matBot = cv::imread(botCamfilenames[i]);
    vector<int> ids;
    vector<vector<Point2f>> corners, rejected;
    aruco::detectMarkers(matBot, dictionary, corners, ids, params, rejected);
    cv::Vec3d rvec, tvec; // from floor 2 bot camera
    aruco::estimatePoseBoard(corners, ids, board, botCam.mtx, botCam.dist, rvec, tvec);
    cv::Mat pose = vec3d2Mat44(rvec, tvec); // pose is double
    pose.convertTo(pose, CV_32F);

    cv::Mat R_bot2floor, T_bot2floor;
    mat44ToRT_(pose_inv, R_bot2floor, T_bot2floor);
    rvecMat_board2botCam.push_back(R_bot2floor);
    tvecMat_board2botCam.push_back(T_bot2floor);
}
/*
* Calculate result R_Bot2Left  => R_cam2gripper
*/
cv::Mat R_bot2lef, T_bot2lef;
cv::calibrateHandEye(rvecMat_lefCam2Board, tvecMat_lefCam2Board,
                     rvecMat_board2botCam, tvecMat_board2botCam,
                     R_bot2lef, T_bot2lef);
www
  • 11
  • 2
  • not sure if hand-eye is suitable for this. it's more for robotics applications, i.e. something involving manipulators. -- generally, you can just determine the global pose of each camera, and then work with those pose matrices (invert, multiply, ...) to get relative poses. that requires you to be able to determine the global position of a camera. you could construct an environment with AR markers at known positions (nominally/designed or measured). it's all about coordinate frames and jumping between them (transforming). -- I would *strongly* recommend ditching C++ for prototyping. it's insane – Christoph Rackwitz Jul 09 '22 at 13:34
  • you will need a robust notation for your coordinate frames and the transformation matrices between them. throw away the "->" and the "foo2bar" stuff. that will continually confuse you and everyone else who knows the math. -- I would suggest approaching the math notation, e.g. `T_dst_src` to express the transformation from `src` frame to `dst` frame, equivalently the pose of `dst` in the `src` frame. `dst_vec = T_dst_src @ src_vec` -- this allows matrices to compose like dominos: `T_cam2_cam1 = T_cam2_world @ T_world_cam1` where `T_cam2_world = inv(T_world_cam2)` – Christoph Rackwitz Jul 09 '22 at 13:37
  • note: using aruco you have transforms from marker space to camera space, i.e. `T_cam1_marker`. if both cameras see the same marker, and you need a transform to move from cam2 to cam1 (or pose of cam2, in cam1 frame), `T_cam1_cam2 = T_cam1_world @ T_world_cam2`, where `world == marker` and `T_world_cam2 = inv(T_cam2_world)` – Christoph Rackwitz Jul 09 '22 at 13:42
  • Thanks for the suggestion, "->" doesn't mean any math logic, I just use these inputs in my setup to replace the corresponding ones in OpenCV. "R_gripper2base" comes from OpenCV official documentation (https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html#gaebfc1c9f7434196a374c382abf43439b) , easy to understand, no problem. Thank u. – www Jul 10 '22 at 01:43
  • yeah official docs made a "design choice" there. I'm not happy with it ("gripper2base" adds cognitive load) but it's a fair choice to make. – Christoph Rackwitz Jul 10 '22 at 10:58

2 Answers2

1

If you have a camera calibration problem without overlapping field of view, then you could, e.g., use

CALICO is based on a similar method as you are trying to do. They first use hand-eye calibration to determine a rough estimate of the poses and then use bundle adjustment with reprojection error to finetune.

oguked
  • 21
  • 2
0

To externally calibrate multiple cameras without overlapping views (i.e. find their locations and orientations with respect to one another) you need to observe in all cameras a common reference object (a.k.a. "calibration target"), or set of objectsm of known shape, location(s) and orientation(s).

This can be achieved in several ways:

  • There is a fixed object, observable in one camera at a time, and the rig is moved with known motion to so that the object become visible in each camera in turn. Example: cameras mounted on a turntable, which rotates to "present" a calibration target to each camera.
  • The is a fixed object, large enough that different parts of it are observable in all cameras, and the geometrical relationships between those parts are known. Example: all cameras look at the night sky, and the ephemerides of the observed starts/constellations are known.

The key point is that you need to "see" in all images "something" from which you can infer camera orientation and location, and this thing must appear in cameras in a way that makes it possible to correlate those locations and orientations with respect to each other.

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40
  • Thanks for your reply. If there is no overlap between the cameras, it is difficult to implement stationary objects in practical situations. That's why I tried to use the hand-eye calibration method to solve it. – www Jul 11 '22 at 13:01
  • If you are talking of hand-eye calibration, you already have a robot manipulator for the camera or the object. Choose which one you want to move. If you do not have a robot manipulator (even a turntable), then why are you talking about hand-eye calibration at all? – Francesco Callari Jul 11 '22 at 15:05
  • My problem takes the form of the system AX = XB, which can be solved using hand-eye calibration techniques. If I'm doing something wrong in the process, I'd appreciate if you could help me figure it out. – www Jul 12 '22 at 08:58
  • It doesn't seem (to me) that you have thought your problem thoroughly enough. Drop the maths, try to gain an intuition for the geometry of your problem first, and for what's physically possible to solve it, and only then move to modeling and engineering the solution. Calibration is all about repeatability of measurements over time. To be able to tell that a measurement is consistently the same, you need a consistent yardstick, one that is used for all the measurements. Think: what can you use as yardstick in your situation. – Francesco Callari Jul 14 '22 at 14:14