11

The image produced by the color and depth sensor on the Kinect are slightly out of alignment. How can I transform them to make them line up?

Mr Bell
  • 9,228
  • 18
  • 84
  • 134

2 Answers2

8

The key to this is the call to 'Runtime.NuiCamera.GetColorPixelCoordinatesFromDepthPixel'

Here is an extension method for the Runtime class. It returns a WriteableBitmap object. This WriteableBitmap is automatically updated as new frames come in. So the usage of it is really simple:

    kinect = new Runtime();
    kinect.Initialize(RuntimeOptions.UseColor | RuntimeOptions.UseSkeletalTracking | RuntimeOptions.UseDepthAndPlayerIndex);
    kinect.DepthStream.Open(ImageStreamType.Depth, 2, ImageResolution.Resolution320x240, ImageType.DepthAndPlayerIndex);
    kinect.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);
    myImageControl.Source = kinect.CreateLivePlayerRenderer(); 

and here's the code itself:

public static class RuntimeExtensions
{
   public static WriteableBitmap CreateLivePlayerRenderer(this Runtime runtime)
   {
      if (runtime.DepthStream.Width == 0)
         throw new InvalidOperationException("Either open the depth stream before calling this method or use the overload which takes in the resolution that the depth stream will later be opened with.");
      return runtime.CreateLivePlayerRenderer(runtime.DepthStream.Width, runtime.DepthStream.Height);
   }
   public static WriteableBitmap CreateLivePlayerRenderer(this Runtime runtime, int depthWidth, int depthHeight)
   {
      PlanarImage depthImage = new PlanarImage();
      WriteableBitmap target = new WriteableBitmap(depthWidth, depthHeight, 96, 96, PixelFormats.Bgra32, null);
      var depthRect = new System.Windows.Int32Rect(0, 0, depthWidth, depthHeight);

      runtime.DepthFrameReady += (s, e) =>
            {
                depthImage = e.ImageFrame.Image;
                Debug.Assert(depthImage.Height == depthHeight && depthImage.Width == depthWidth);
            };

      runtime.VideoFrameReady += (s, e) =>
            {
                // don't do anything if we don't yet have a depth image
                if (depthImage.Bits == null) return;

                byte[] color = e.ImageFrame.Image.Bits;

                byte[] output = new byte[depthWidth * depthHeight * 4];

                // loop over each pixel in the depth image
                int outputIndex = 0;
                for (int depthY = 0, depthIndex = 0; depthY < depthHeight; depthY++)
                {
                    for (int depthX = 0; depthX < depthWidth; depthX++, depthIndex += 2)
                    {
                        // combine the 2 bytes of depth data representing this pixel
                        short depthValue = (short)(depthImage.Bits[depthIndex] | (depthImage.Bits[depthIndex + 1] << 8));

                        // extract the id of a tracked player from the first bit of depth data for this pixel
                        int player = depthImage.Bits[depthIndex] & 7;

                        // find a pixel in the color image which matches this coordinate from the depth image
                        int colorX, colorY;
                        runtime.NuiCamera.GetColorPixelCoordinatesFromDepthPixel(
                            e.ImageFrame.Resolution,
                            e.ImageFrame.ViewArea,
                            depthX, depthY, // depth coordinate
                            depthValue,  // depth value
                            out colorX, out colorY);  // color coordinate

                        // ensure that the calculated color location is within the bounds of the image
                        colorX = Math.Max(0, Math.Min(colorX, e.ImageFrame.Image.Width - 1));
                        colorY = Math.Max(0, Math.Min(colorY, e.ImageFrame.Image.Height - 1));

                        output[outputIndex++] = color[(4 * (colorX + (colorY * e.ImageFrame.Image.Width))) + 0];
                        output[outputIndex++] = color[(4 * (colorX + (colorY * e.ImageFrame.Image.Width))) + 1];
                        output[outputIndex++] = color[(4 * (colorX + (colorY * e.ImageFrame.Image.Width))) + 2];
                        output[outputIndex++] = player > 0 ? (byte)255 : (byte)0;
                    }
                }
                target.WritePixels(depthRect, output, depthWidth * PixelFormats.Bgra32.BitsPerPixel / 8, 0);
            };
            return target;
        }
    }
Robert Levy
  • 28,747
  • 6
  • 62
  • 94
  • Sadly that link its throwing a yellow screen of death my way right now. But I am looking into the method you mentioned – Mr Bell Jul 28 '11 at 01:48
  • @Mr-Bell - I've updated this post with the actual code instead of a link to it – Robert Levy Jul 28 '11 at 03:23
  • This looks like it works. It does seem like calling GetColorPixelCoordinatesFromDepthPixel is killing my framerate. – Mr Bell Jul 28 '11 at 03:40
  • Is it possible to call `GetColorPixelCoordinatesFromDepthPixel` for a small number of calibration corners, then do interpolation or extrapolation inside your code? Are those misalignments mostly affine? – rwong Aug 19 '11 at 06:45
  • @rwong, i don't know - that's a great question. if you post it as a separate question on this site, i'd vote it up – Robert Levy Aug 22 '11 at 15:45
  • @Robert Levy: Please go ahead and ask it yourself. I'm just raising this question out of curiosity; as I do not have a use for Kinect yet. (If I ask it, I wouldn't have any means to verify those answers.) – rwong Aug 22 '11 at 17:03
  • @robert thanks for your solution, the user looks great in this program with background removed – ravithejag May 28 '12 at 06:02
2

One way to do this is to assume that the color and depth images have similar variations in them, and to cross-correlate the two images (or smaller versions of them).

  • Pre-whiten the images to get at the underlying variations.
  • Cross-correlate the pre-whitened images or smaller versions of them.
  • The peak position of the cross-correlation will tell you the offset in x and y
Peter K.
  • 8,028
  • 4
  • 48
  • 73
  • Peter, those are interesting articles. However, I think that this solution might be significantly more empirical. I think it might just be an offset or something like that – Mr Bell Jul 28 '11 at 01:50
  • :-) OK. I'm probably over-thinking it. [I've just been reading this sort of stuff...](http://liu.diva-portal.org/smash/record.jsf?pid=diva2:420400) – Peter K. Jul 28 '11 at 01:57
  • 1
    in the factory, each kinect device is calibrated and the offsets between the cameras is burned into the device's memory. the trick is in finding the right api to make use of that data. right now the official kinect sdk only provides one such api but others are being considered for future releases – Robert Levy Jul 28 '11 at 03:25
  • @Robert: Thanks for the info! Sounds like fun. :-) – Peter K. Jul 28 '11 at 11:51