Kinect - Map (x, y) pixel coordinates to "real world" coordinates using depth

Question

I'm working on a project that uses the Kinect and OpenCV to export fintertip coordinates to Flash for use in games and other programs. Currently, our setup works based on color and exports fingertip points to Flash in (x, y, z) format where x and y are in Pixels and z is in Millimeters.

But, we want map those (x, y) coordinates to "real world" values, like Millimeters, using that z depth value from within Flash.

As I understand, the Kinect 3D depth is obtained via projecting the X-axis along the camera's horizontal, it's Y-axis along the camera's vertical, and it's Z-axis directly forward out of the camera's lens. Depth values are then the length of the perpendicular drawn from any given object to the XY-plane. See the picture in the below link (obtained from microsoft's website).

Microsoft Depth Coordinate System Example

Also, we know that the Kinect's horizontal field of vision is projected in a 117 degree angle.

Using this information, I figured I could project the depth value of any given point onto the x=0, y=0 line and draw a horizontal line parallel to the XY-plane at that point, intersecting the camera's field of vision. I end up with a triangle, split in half, with a height of the depth of an object in question. I can then solve for the width of the field of view using a little trigonometry. My equation is:

W = tan(theta / 2) * h * 2

Where:

W = Field of view Width
theta = Horizontal field of view Angle (117 degrees)
h = Depth Value

(Sorry, I can't post a picture, I would if I could)

Now, solving for a depth value of 1000mm (1 meter), gives a value of about 3264mm.

However, when actually LOOKING at the camera image produced I get a different value. Namely, I placed a meter stick 1 meter away from the camera and noticed that the width of the frame was at most 1.6 meters, not the estimated 3.264 meters from calculations.

Is there something I'm missing here? Any help would be appreciated.

Is the meter stick 1 meter from the Kinect? Or one meter from the starting processing range of it? If directly from the camera you need to get the distance and add that distance to the distance the Kinect misses — Liam McInroy, Aug 04 '12 at 02:18
Which library are you using for the Kinect: the official KinectSDK, libfreenect, OpenNI ? — George Profenza, Aug 04 '12 at 08:17
I am using the official Kinect SDK, and the Kinect compensates for the depth that it cannot read (meaning that it's starting depth is at about a little more than 30 cm). In proceeding I've decided to refactor my project to use the Kinect to map depth coordinates to skeleton coordinates. — user1449837, Aug 07 '12 at 21:41
Just to finalize this all, the Map to Skeleton Coordinates works great. I, however, have no idea how it's doing the conversion. — user1449837, Aug 09 '12 at 19:00
so is the above equation correct? I need to know the equation, not the code, so if anyone could please advise me? Thanks — Tak, Sep 02 '13 at 01:13

score 4 · Answer 1 · edited May 23 '17 at 12:17

The depth stream is correct. You should indeed take the depth value, and then from the Kinect sensor, you can easily locate the point in the real world relative to the Kinect. This is done by simple trigonometry, however you must keep in mind that the depth value is the distance from the Kinect "eye" to the point measured, so it is a diagonal of a cuboid.

Actually, follow this link How to get real world coordinates (x, y, z) from a distinct object using a Kinect

It's no use rewriting, there you have the right answer.

Coeffect · Answer 2 · 2012-08-04T02:16:55.047

2

A few things:

A) I know you got the 117 degree FOV from a function in the Kinect sensor, but I still don't believe that's correct. That's a giant FOV. I actually got the same number when I ran the function on my Kinect, but I still don't believe it. While 57 (or 58.5 from some sources) seems low, it's definitely more reasonable. Try putting the Kinect on a flat surface and places object just inside its view and measure the FOV that way. Not precise, but I don't think you'll find it to be over 100 degrees.

B) I saw an article demonstrating the actual distance vs Kinect's reported depth; it's not linear. This wouldn't actually affect your 1.6 meter trig issue, but it's something to keep in mind going forward.

C) I would strongly suggest changing your code to accept the real world points from the Kinect. Better yet, just send over more data if that's possible. You can continue to provide the current data, and just tack the real world coordinate data onto that.

edited Aug 04 '12 at 02:16

answered Aug 02 '12 at 20:08

Coeffect

8,772
2
27
42

Ah yes, I am aware of that function. However, I am attempting to do this calculation from within Flash, which doesn't have access to that function. Potentially, I could refactor my whole project to use it, but I would like to avoid that. – user1449837 Aug 02 '12 at 20:16
@user1449837: Ah, my mistake. Actually, where are you getting this 117 degrees from? I was under the impression that the horizontal FOV of the Kinect was 57 degrees. – Coeffect Aug 02 '12 at 20:21
1

From the DepthImageStream property of the Kinect object, you can use nominalhorizontalfieldofview to get the horizontal FOV of the Kinect, you can use nominalverticalfieldofview for the vertical. Using it on my Kinect produced 117, the trigonometry then required me to cut that in half in the actual calculation to 58.5. This is apparently a typo in the documentation, I got the idea from this discussion: http://social.msdn.microsoft.com/Forums/en-US/kinectsdk/thread/8d292fe3-d832-455f-93c5-1d119a0ee997 – user1449837 Aug 02 '12 at 20:28
@Coeffect I then tried to edit to make it both clearer and so I could take away that downvote. – Liam McInroy Aug 04 '12 at 02:06

score 0 · Answer 3 · edited Feb 20 '16 at 02:48

Vector subtraction should get you the distance between any two points given by the Kinect. You'll have to look up the best way to perform Vector subtraction in your specific environment, but I hope this helps anyway. In Processing, which I use, there's a PVector class, where to subtract you simply go PVector difference = PVector.sub(vector1, vector2), where vector1 and vector2 are the vectors representing your two points, and difference is the new vector between the two points. You then require the magnitude of the difference vector. Again, in processing, this is simply found by magnitude = difference.mag(). That magnitude should be your desired distance.

Here's a great rundown of both vectors in processing, and vectors in general: https://processing.org/tutorials/pvector/

Kinect - Map (x, y) pixel coordinates to "real world" coordinates using depth

3 Answers3

Linked