I'm working on a project that uses the Kinect and OpenCV to export fintertip coordinates to Flash for use in games and other programs. Currently, our setup works based on color and exports fingertip points to Flash in (x, y, z) format where x and y are in Pixels and z is in Millimeters.
But, we want map those (x, y) coordinates to "real world" values, like Millimeters, using that z depth value from within Flash.
As I understand, the Kinect 3D depth is obtained via projecting the X-axis along the camera's horizontal, it's Y-axis along the camera's vertical, and it's Z-axis directly forward out of the camera's lens. Depth values are then the length of the perpendicular drawn from any given object to the XY-plane. See the picture in the below link (obtained from microsoft's website).
Microsoft Depth Coordinate System Example
Also, we know that the Kinect's horizontal field of vision is projected in a 117 degree angle.
Using this information, I figured I could project the depth value of any given point onto the x=0, y=0 line and draw a horizontal line parallel to the XY-plane at that point, intersecting the camera's field of vision. I end up with a triangle, split in half, with a height of the depth of an object in question. I can then solve for the width of the field of view using a little trigonometry. My equation is:
W = tan(theta / 2) * h * 2
Where:
- W = Field of view Width
- theta = Horizontal field of view Angle (117 degrees)
- h = Depth Value
(Sorry, I can't post a picture, I would if I could)
Now, solving for a depth value of 1000mm (1 meter), gives a value of about 3264mm.
However, when actually LOOKING at the camera image produced I get a different value. Namely, I placed a meter stick 1 meter away from the camera and noticed that the width of the frame was at most 1.6 meters, not the estimated 3.264 meters from calculations.
Is there something I'm missing here? Any help would be appreciated.