ARKit intrinsic differences between portrait and landscape?

Question

I am implementing marker tracking for ARKit using OpenCV (using ARuco) and see good results when doing portrait orientation of the camera, but a slight offset when doing landscape.

ARuco markers on known positions:

Detection in portrait mode works:

In landscape orientation, detection is showing an offset:

In detail what I am doing:

For each ARFrame do the following:
Get CVPixelBuffer height, width, baseaddress and convert to cv::Mat
Run marker detection and pose estimation (cv::aruco::detectMarkers, cv::aruco::estimatePoseSingleMarkers) using intrinsic from ARFrame.
- Intrinsic needs to be transposed for ARKit column-major to OpenCV row-major matrix storage.
- OpenCV rvec and tvec are converted into a 4x4 transform using cv::Rodrigues and then converted from OpenCV to OpenGL coordinate space by diag(1,-1,-1,1) * transform
- Result is converted back from row-major to column-major and is the transform of the marker in camera space.
Multiplying the transform with the ARCameras transform gives the marker plane in world coordinates, which I visualise as a green rectangle.

My questions:

Am I missing anything?
Should frame.displayTransform play any part the conversion?
Why does the intrinsic change when rotating the device? width and height of the pixel buffer do not change.
Any other ideas?

Update 25.07.2017:

I figured this out! This is a bug from Apple! They messed up the intrinsics between UIInterfaceOrientation.landscapeLeft and landscapeRight. If you cache these values and swap them, then everything works great.
iOS 11 Beta 4 does not change anything
I am keeping this question open, until it is resolved by Apple (Bug ID 33519315 on Radar).

Update 14.09.2017:

Apple closed the bug, saying everything is correct. I am not sure that they are correct, but potentially it is really a problem between OpenCV and ARKit.

Thanks for the detailed description of your approach and the followup! I'm trying to do something very similar but am stuck at the conversion of rvec & tvec to a transform matrix. Can you be more specific about how you're using the Rodrigues function to create a usable (camera space) transform matrix for the markers? Would be greatly appreciated. — KennyDeriemaeker, Sep 25 '17 at 09:29
@KennyDeriemaeker I am sure this is too late, but I've been working on a demo that does something similar. See ~line 87 in OpenCVWrapper.mm https://github.com/pukeanddie/aruco-arkit-localizer — nwales, Feb 26 '18 at 20:14

ARKit intrinsic differences between portrait and landscape?

0 Answers0