1

My goal is to use tinyYolov3 model to perform object detection in real-time through the HoloLens. I want to incorporate the model as an ONNX file directly in the project and to compute the predictions inside the HoloLens itself. In order to do so, I am planning to use Windows.media and Windows.AI.MachineLearning libraries as a pipeline between the camera and my predictions.

Following this tutorial, I am able to capture the frames as VideoFrame and I can convert them in ImageFeatureValue to match my input type requirement. My issue now is about the shape requirement. Yolo models need a 3x416x416 frame as input and I can't find any docs online about resizing VideoFrame or ImageFeatureValue.

Thank you very much for your help.

using (var frameReference = CameraFrameReader.TryAcquireLatestFrame())
using (var videoFrame = frameReference?.VideoMediaFrame?.GetVideoFrame())
await ModelHelper.EvaluateVideoFrameAsync(videoFrame).ConfigureAwait(false);

public async Task EvaluateVideoFrameAsync(VideoFrame frame)
{
    if (frame != null)
    {
        try
        {
            ModelInput inputData = new ModelInput();
            inputData.image = ImageFeatureValue.CreateFromVideoFrame(frame);
            //TODO: CHANGE SIZE FRAME
            var output = await Model.EvaluateAsync(inputData).ConfigureAwait(false);
        }
    }
}
OD4ZeWin
  • 31
  • 2

1 Answers1

1

I have no experience using the Windows Machine Learning API and ImageFeatureValue class. But when I tried to resize frames from the HoloLens, I had to use the SoftwareBitmap instead of VideoFrame. Then, I use BitmapEncoder to resize them, and convert back to VideoFrame:

    private async Task<SoftwareBitmap> ResizeBitmap(SoftwareBitmap softwareBitmap, uint width, uint height)
{
    using (InMemoryRandomAccessStream stream = new InMemoryRandomAccessStream())
    {
        BitmapEncoder encoder = await BitmapEncoder.CreateAsync(BitmapEncoder.BmpEncoderId, stream);

        encoder.SetSoftwareBitmap(softwareBitmap);

        encoder.BitmapTransform.ScaledWidth = width;
        encoder.BitmapTransform.ScaledHeight = height;
        encoder.BitmapTransform.InterpolationMode = BitmapInterpolationMode.NearestNeighbor;

        await encoder.FlushAsync();

        BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);

        return await decoder.GetSoftwareBitmapAsync(softwareBitmap.BitmapPixelFormat, softwareBitmap.BitmapAlphaMode);
    }
}

var inputBitmap = frameReference.VideoMediaFrame.SoftwareBitmap;
var outputBitmap = ResizeBitmap(inputBitmap, your_width, your_height);

var outputVideoFrame = VideoFrame.CreateWithSoftwareBitmap(SoftwareBitmap);
chrissi_gsa
  • 626
  • 5
  • 11