Using Aruco markers (OpenCV) to map camera pose to SceneKit

Question

Trying to map the camera position from Aruco tracking back to the camera position in SceneKit. It's almost working but the tracking is really unstable and it appears the conversion to the SceneKit camera pose is incorrect as it's floating over the marker in the camera view and moving about as I move the camera around. Can anyone see what I may be doing wrong here in the conversion back to the sceneKit camera translation and position vectors?

@interface ArucoPayload : NSObject

@property BOOL valid; // Used to determine if the tracking was valid and hides the scenekit nodes if not
@property UIImage *image;
@property CGRect boardSize;
@property SCNVector4 rotationVector;
@property SCNVector3 translationVector;
@end

Mat rvec(3, 1, DataType<double>::type);
Mat tvec(3, 1, DataType<double>::type);

...
aruco::estimatePoseBoard(corners, markerIds, gridBoard, self.camMatrix, self.distCoeffs, rvec, tvec);
[self updateCameraProjection:payload withRotation:rvec andTranslation:tvec];
...

-(void) updateCameraProjection:(ArucoPayload *)payload withRotation:(Mat)rvec andTranslation:(Mat)tvec {

    cv::Mat RotX(3, 3, cv::DataType<double>::type);
    cv::setIdentity(RotX);
    RotX.at<double>(4) = -1;
    RotX.at<double>(8) = -1;

    cv::Mat R;
    cv::Rodrigues(rvec, R);

    R = R.t();
    Mat rvecConverted;
    Rodrigues(R, rvecConverted); 
    rvecConverted = RotX * rvecConverted;

    Mat tvecConverted = -R * tvec;
    tvecConverted = RotX * tvecConverted;

    payload.rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
    payload.translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));
}

func updateCameraPosition(payload:ArucoPayload) {

    if(payload.valid) {

        sceneView.scene?.rootNode.isHidden = false

        // Add nodes first time we get an updated position
        if(sceneView.scene?.rootNode.childNodes.count == 1) {

            // Add box node
            addBoxNode(to: sceneView, payload: payload)
        }

        cameraNode.rotation = payload.rotationVector
        cameraNode.position = payload.translationVector

    } else {

        sceneView.scene?.rootNode.isHidden = true
    }
}

The drawing done in OpenCV is correct and my axis and frame around the Aruco board is tracking accurately as can be seen in the video.

Any help is much appreciated. Here is the video of the scene. The yellow object which should be locked to the position of the marker seems very unstable.

https://youtu.be/ZvKtZ3DNdrk

Camera Calibration:

// Wait until we have captured enough frames
if(self.numberOfFramesForCalibration == 0) {

    NSLog(@"Starting calibration with 20 images");

    vector< vector< Point2f > > allCornersConcatenated;
    vector< int > allIdsConcatenated;
    vector< int > markerCounterPerFrame;
    Mat cameraMatrix, distCoeffs;
    vector< Mat > rvecs, tvecs;
    double repError;
    int calibrationFlags = 0;

    // prepare data for calibration
    markerCounterPerFrame.reserve(allCorners.size());
    for(unsigned int i = 0; i < allCorners.size(); i++) {
        markerCounterPerFrame.push_back((int)allCorners[i].size());
        for(unsigned int j = 0; j < allCorners[i].size(); j++) {
            allCornersConcatenated.push_back(allCorners[i][j]);
            allIdsConcatenated.push_back(allIds[i][j]);
        }
    }

    // calibrate camera
    repError = aruco::calibrateCameraAruco(allCornersConcatenated, allIdsConcatenated,
                                           markerCounterPerFrame, self.data.board, imgSize, cameraMatrix,
                                           distCoeffs, rvecs, tvecs);

    bool saveOk = [self saveCameraParams:imgSize aspect:1 flags:calibrationFlags matrix:cameraMatrix coeff:distCoeffs avgErr:repError];
    if(saveOk) {

        self.calibrationRequired = false;

    }
}

Why `Mat tvecConverted = -R * tvec;`? I don't think the translation vector is rotated that needs to be rotated back. — fireant, Jun 30 '19 at 20:07

rob3c · Answer 1 · 2019-07-06T22:17:20.770

2

Unfortunately, this isn’t a complete example, so it’s unclear where the error is. Anyway, there still seems to be a possible issue in what you’ve posted, so maybe my talking through my confusion will be helpful.

RotX seems intended to account for axis differences between OpenCV (X right, Y down, Z in) and SceneKit (X right, Y up, Z out).

tvec and rvec represent the world origin relative to the OpenCV camera.

R.t() is the same as R.inv() for orthonormal rotations, so R is now the inverse matrix of the original rvec via R = R.t().

So Mat tvecConverted = -R * tvec; (or more clearly R * -tvec) is the camera in world coordinates. Rodrigues(R, rvecConverted); seems a similar transform to the world frame for rvec.

Then, each are multiplied by RotX to take OpenCV coordinates to SceneKit coordinates and are assigned to payload.

UPDATE: After your code update, we see the values assigned to payload are ultimately assigned to cameraNode position and rotation values. Since a SceneKit SCNCamera always points along the negative Z-axis of its parent SCNNode with the same orientation, positioning and orienting the parent SCNNode positions and orients the camera itself. That suggests tvecConverted and rvecConverted above are correct, since the camera's SCNNode seems parented to the root node.

However, there's still the matter of projecting from the ScenKit camera's space to the pixel space of the display, which doesn't seem to be in the code excerpt that you posted. I suspect this will need to match the intrinsic camera matrix and distortion used when drawing on the OpenCV side for them to align correctly.

In the SCNCamera docs page, the projectionTransform property is described like this:

This transformation expresses the combination of all the camera’s geometric properties: projection type (perspective or orthographic), field of view, depth limits, and orthographic scale (if applicable). SceneKit uses this transformation to convert points in the camera node’s coordinate space to the renderer’s 2D space when rendering and processing events.

You can use this transformation directly if your app needs to convert between view and renderer coordinates for other purposes. Alternatively, if you compute your own projection transform matrix, you can set this property to override the transformation synthesized from the camera’s geometric properties.

Since you haven't posted code related to camera intrinsics, it's unclear if this is the issue or not. However, I suspect aligning the intrinsics between OpenCV and SceneKit will solve the problem. Also, SceneKit transforms are evaluated with respect to a node's pivot property, so you'll want to be careful if you're using that (e.g. to place a SCNBox relative to its corner rather than its center.)

edited Jul 06 '19 at 22:17

answered Jul 03 '19 at 17:20

rob3c

1,936
1
21
20

hi @rob3c I've updated the code above to show you how I use the payload class. It's just used to pass the info back to my SceneKit class. Thank you for the help. – d0n13 Jul 03 '19 at 21:06
@d0n13 That’s helpful. Some drawing details are missing. I assume the yellow object is created in `addBoxNode`? Are you setting `pivot` for it? The marker is drawn in opencv, right? Does that use corners, but the yellow object uses a center position with declared box dimensions? I’m wondering about alignment details. Also, I assume the single child node checked for in the first pass is the camera node, right? – rob3c Jul 04 '19 at 10:17
@d0n13 Can you post a frame from your video and the actual variable and object values being used so we have a concrete example to work with? It looks like the above assignments are correct since scenekit camera uses its parent node's position and rotation and looks down the -Z axis. But that still leaves the intrinsics transform on the scenekit side to pixel space. For example, what's the FOV and such for the camera? And does the yellow object box have a pivot set? – rob3c Jul 05 '19 at 17:34
hi @rob3c, I've been away from the keyboard for a while (last 8 weeks). Would you still be able to help me out on this? – d0n13 Sep 02 '19 at 10:54
I generate the camera intrinsics when the app is installed by detecting the markers and then just taking photos of it from different angles. The app will take up to 20 photos before running the openCV calibrateCameraAruco – d0n13 Sep 02 '19 at 10:57
Code updated above. This camera intrinsics is then read into a camMatrix and distCoeffs Mat when the app starts up in future to avoid having to recalibrate again. – d0n13 Sep 02 '19 at 11:08
The yellow object does not have a pivot set @rob3c – d0n13 Sep 02 '19 at 11:50
1

@d0n13 I mentioned projectionTransform and linked to docs because it looks like you're not using opencv intrinsics in scenekit for its rendering. You only seem to use rvec/tvec (i.e. extrinsics) to position the scenekit camera. However, you also probably need to modify the scenekit projectionTransform so it matches opencv intrinsics. Otherwise, FOV, optical axis offset, etc, won't match when drawing on the same image with both opencv and scenekit. In other words, you're mappings from world space to camera space match, but not from camera space to pixel space. – rob3c Sep 03 '19 at 01:16
The code above shows how I have set the payload.rotationVector and payload.translationVector which is passed to the SceneKit camera via the cameraNode.rotation and cameraNode.position. This is obviously incorrect? Sorry, I'm not too well up on on this area yet so I appreciate the help – d0n13 Sep 03 '19 at 15:49
1

@d0n13 Those are *extrinsic* camera params `tvec` and `rvec` describing camera location in world coordinates i.e. the transform between world space and camera space and change when the camera moves. But that's only half of it. The camera also has *intrinsic* params `distCoeffs` and `cameraMatrix` describing how light entering the lens transforms to pixels, regardless of the camera's position or orientation, due to lens distortion and internal sensor geometry affecting light detection. They describe transform from camera space to image space. opencv and scenekit need the same values. – rob3c Sep 03 '19 at 16:51
So how do I go about setting those within SceneKit? I presume I have the correct values in opencv available to me? @rob3c – d0n13 Sep 03 '19 at 17:40
I understand now after reading on on what you are prompting me with that I need to set all three parameters. I suspect that my position is almost correct as it appears in the video more or less where I expected it to be but I can move it with the pivot to get my coordinates in order. The jumpiness of the image is due to not having set the intrinsic parameters from the cameraMatrix and distortion coefficients. However, I'm not having any luck figuring out how to do that yet or find a solid example. Any help much appreciated or pointers to some code I may have missed so far... – d0n13 Sep 03 '19 at 19:56
@d0n13 I'd try assigning the `cameraMatrix` calculated in opencv to the scenekit `SCNCamera` node's `projectionTransform` property to see if it improves the match. It'll need to be converted from 3x3 to homogeneous 4x4 form (see camera matrix on wikipedia), and it also probably needs to account for the coordinate direction difference between opencv and scenekit via multiplication by `RotX`. – rob3c Sep 07 '19 at 08:56
Thanks @rob3c. I’ll let you know if I can figure that out and get it working. – d0n13 Sep 07 '19 at 13:43

Using Aruco markers (OpenCV) to map camera pose to SceneKit

1 Answers1