2

I'm currently working on an augmented reality application using a medical imaging program called 3DSlicer. My application runs as a module within the Slicer environment and is meant to provide the tools necessary to use an external tracking system to augment a camera feed displayed within Slicer.

Currently, everything is configured properly so that all that I have left to do is automate the calculation of the camera's extrinsic matrix, which I decided to do using OpenCV's solvePnP() function. Unfortunately this has been giving me some difficulty as I am not acquiring the correct results.

My tracking system is configured as follows:

  • The optical tracker is mounted in such a way that the entire scene can be viewed.
  • Tracked markers are rigidly attached to a pointer tool, the camera, and a model that we have acquired a virtual representation for.
  • The pointer tool's tip was registered using a pivot calibration. This means that any values recorded using the pointer indicate the position of the pointer's tip.
  • Both the model and the pointer have 3D virtual representations that augment a live video feed as seen below.
  • The pointer and camera (Referred to as C from hereon) markers each return a homogeneous transform that describes their position relative to the marker attached to the model (Referred to as M from hereon). The model's marker, being the origin, does not return any transformation.

Depiction of correctly augmented environment. Extrinsic was manually adjusted to demonstrate process is correct, but is inaccurate.

I obtained two sets of points, one 2D and one 3D. The 2D points are the coordinates of a chessboard's corners in pixel coordinates while the 3D points are the corresponding world coordinates of those same corners relative to M. These were recorded using openCV's detectChessboardCorners() function for the 2 dimensional points and the pointer for the 3 dimensional. I then transformed the 3D points from M space to C space by multiplying them by C inverse. This was done as the solvePnP() function requires that 3D points be described relative to the world coordinate system of the camera, which in this case is C, not M.

Once all of this was done, I passed in the point sets into solvePnp(). The transformation I got was completely incorrect, though. I am honestly at a loss for what I did wrong. Adding to my confusion is the fact that OpenCV uses a different coordinate format from OpenGL, which is what 3DSlicer is based on. If anyone can provide some assistance in this matter I would be exceptionally grateful.

Also if anything is unclear, please don't hesitate to ask. This is a pretty big project so it was hard for me to distill everything to just the issue at hand. I'm wholly expecting that things might get a little confusing for anyone reading this.

Thank you!

UPDATE #1: It turns out I'm a giant idiot. I recorded colinear points only because I was too impatient to record the entire checkerboard. Of course this meant that there were nearly infinite solutions to the least squares regression as I only locked the solution to 2 dimensions! My values are much closer to my ground truth now, and in fact the rotational columns seem correct except that they're all completely out of order. I'm not sure what could cause that, but it seems that my rotation matrix was mirrored across the center column. In addition to that, my translation components are negative when they should be positive, although their magnitudes seem to be correct. So now I've basically got all the right values in all the wrong order.

SwarthyMantooth
  • 1,799
  • 1
  • 15
  • 27
  • 1
    Welcome to mirror ambiguities. Easy to solve though - you basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring. – Francesco Callari Sep 22 '15 at 14:50
  • Oh dang. So you're saying I should just manually manipulate the transform so that everything is in the correct place? I can assume that this sort of issue will be consistent across all calculations due to the mirroring of the image? – SwarthyMantooth Sep 22 '15 at 16:30

1 Answers1

1

Mirror/rotational ambiguity.

You basically need to reorient your coordinate frames by imposing the constraints that (1) the scene is in front of the camera and (2) the checkerboard axes are oriented as you expect them to be. This boils down to multiplying your calibrated transform for an appropriate ("hand-built") rotation and/or mirroring.

The basic problems is that the calibration target you are using - even when all the corners are seen, has at least a 180^ deg rotational ambiguity unless color information is used. If some corners are missed things can get even weirder.

You can often use prior info about the camera orientation w.r.t. the scene to resolve this kind of ambiguities, as I was suggesting above. However, in more dynamical situation, of if a further degree of automation is needed in situations in which the target may be only partially visible, you'd be much better off using a target in which each small chunk of corners can be individually identified. My favorite is Matsunaga and Kanatani's "2D barcode" one, which uses sequences of square lengths with unique crossratios. See the paper here.

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40
  • Makes sense to me. I'll test it out and see how it influences the results! – SwarthyMantooth Sep 22 '15 at 17:28
  • So multiplying by a 180 degree rotation didn't work on any of the axises. To demonstrate the issue I've provided the manually-calibrated ground truth extrinsic and then the extrinsic calculated using SolvePnP. I also expanded the data set to make sure that it's not an issue of local minima. EDIT: Can't get the formatting to work. Will add to question in about 30 minutes as an update. – SwarthyMantooth Sep 22 '15 at 18:27
  • I did not say that your particular ambiguity was the 180^ deg one - without looking at the data is hard to tell. Suggest you visualize the rotation matrix you get in relation to the camera frame (colored pencils and rubber rings help). Remember that rotation you get is world-to-camera, i.e. the columns of the rotation matrix are ordinately the components in camera frame of the world (i.e. target) coordinate frame. – Francesco Callari Sep 22 '15 at 18:48
  • Just an update, I found out that SolvePnP assumes a z axis going into the camera as there is no way for the camera to see the checkerboard if it's rotated the other way. However, as the issue is now out of scope of the original question, I've written a new question (Linked below if you want to give it a shot) and I'll accept your answer! http://stackoverflow.com/questions/32851702/extrinsic-calibration-with-cvsolvepnp – SwarthyMantooth Sep 30 '15 at 00:13