0

I am building a C# .NET Core application that gathers user interaction data and stores it alongside a video feed. I intend to utilize timed metadata in an MPEG container to store the interaction data and h264 encoded video data.

According to the official documentation it appears to be possible to encode a metadata stream during a transcode operation with the MediaTranscoder. The code I am using has been modified from a screen recorder example. It works by creating a custom MediaEncodingProfile set up to support metadata and a MediaStreamSource containing both a VideoStreamDescriptor and a TimedMetadataStreamDescriptor before using those to perform the encoding with the MediaTranscoder.

public class Encoder
{
    private MediaEncodingProfile GetEncodingProfile()
    {
        var profile = new MediaEncodingProfile();
        var containerEncoding = new ContainerEncodingProperties
        {
            Subtype = MediaEncodingSubtypes.Mpeg4
        };

        var videoEncoding = new VideoEncodingProperties
        {
            Subtype = MediaEncodingSubtypes.H264,
            Width = configuration.Width,
            Height = configuration.Height,
            Bitrate = configuration.BitsPerSecond,
            FrameRate = { Denominator = 1, Numerator = configuration.FramesPerSecond },
            PixelAspectRatio = { Denominator = 1, Numerator = 1 }
        };

        profile.Container = containerEncoding;
        profile.Video = videoEncoding;

        return profile;
    }

    private Tuple<VideoStreamDescriptor, TimedMetadataStreamDescriptor> GetStreamDescriptors()
    {
        var videoEncoding = VideoEncodingProperties.CreateUncompressed(MediaEncodingSubtypes.Bgra8,
                                                                       configuration.InputWidth, configuration.InputHeight);

        var videoStreamDescriptor = new VideoStreamDescriptor(videoEncoding);

        var metadataEncoding = new TimedMetadataEncodingProperties
        {
            Subtype = "{36002D6F-4D0D-4FD7-8538-5680DA4ED58D}"
        };

        byte[] streamFormatData = GetMetadataStreamFormatData(); // This just sets some arbitrary bytes

        metadataEncoding.SetFormatUserData(streamFormatData);

        var metadataStreamDescriptor = new TimedMetadataStreamDescriptor(metadataEncoding);

        return new Tuple<VideoStreamDescriptor, TimedMetadataStreamDescriptor>(
                videoStreamDescriptor, metadataStreamDescriptor);
   }

   private MediaStreamSource GetMediaStreamSource(IMediaStreamDescriptor videoStreamDescriptor,
                                                  IMediaStreamDescriptor metadataStreamDescriptor)
   {
        var mediaStreamSource = new MediaStreamSource(videoStreamDescriptor, metadataStreamDescriptor)
        {
            BufferTime = TimeSpan.FromSeconds(0)
        };

        mediaStreamSource.Starting += OnStart;
        mediaStreamSource.SampleRequested += OnSampleRequested;

        return mediaStreamSource;
    }

    private void OnStart(MediaStreamSource sender, MediaStreamSourceStartingEventArgs args)
    {
        // Intentionally omitted
    }

    private void OnSampleRequested(MediaStreamSource sender, MediaStreamSourceSampleRequestedEventArgs args)
    {
        // This only gets called for the video stream, not the metadata stream
    }

    private MediaTranscoder GetTranscoder()
    {
        var transcoder = new MediaTranscoder { HardwareAccelerationEnabled = true };
        return transcoder;
    }

    public async Task TranscodeAsync()
    {
        var transcoder = GetTranscoder();

        var (videoStreamDescriptor, metadataStreamDescriptor) = GetStreamDescriptors();

        var mediaStreamSource = GetMediaStreamSource(videoStreamDescriptor, metadataStreamDescriptor);
        var encodingProfile = GetEncodingProfile();
        await using var destinationFile = File.Open(configuration.FilePath, FileMode.Create);

        var prepareTranscodeResult = await transcoder.PrepareMediaStreamSourceTranscodeAsync(
                    mediaStreamSource, destinationFile.AsRandomAccessStream(), encodingProfile);
        await prepareTranscodeResult.TranscodeAsync().AsTask();
    }
}

The problem I am facing is that the SampleRequested event does not get raised for the timed metadata stream, only for the video stream. During testing I replaced the timed metadata stream with an audio stream and the SampleRequested event correctly got raised for both the video stream and the audio stream. I suspect there might be a different way of adding data to a timed metadata stream, perhaps using TimedMetadataTrack and DataCue, but my efforts have thus far proven unsuccessful.

What is the correct way of adding timed metadata available only on a per-sample basis during encoding to a stream when using MediaTranscoder (and potentially MediaStreamSource)?

Mark
  • 1
  • 2
  • Hello, can you provide a minimal reproducible demo and the system version required by the application? We have noticed your problem and are contacting an engineer for help. – Richard Zhang Jan 20 '20 at 09:06
  • Apologies for the delay. There is a full example for [download here](https://www.dropbox.com/s/2fjratzftx3qtf9/Transcoding.zip?dl=1). If you build and run the example, the debugger will break in the event handler method, that only gets called to request video samples but not metadata samples. This example runs on .NET Core 3.1.1 and Windows 10 1909 18363.592, but we have also experienced this issue on builds 1903 and 1809. Thank you for your help. – Mark Jan 22 '20 at 18:21
  • @RichardZhang-MSFT What is the status on this? Is there someone I could directly try to get in touch with to have this resolved? – Mark Jan 26 '20 at 18:45
  • Hi, I have reported this problem with the sample you provided. If there are any new progress, I will post them here as soon as possible. – Richard Zhang Jan 27 '20 at 02:17

1 Answers1

0

After communicating with the engineer, now this problem has a result.

While the MediaTranscoder does support timed metadata streams, when the profile is set on the MF Transform Engine, it is dropped to only support 1 audio and 1 video stream. So using the MediaTranscoder will never work.

Related documents have been updated: Transcode media files

Dharman
  • 30,962
  • 25
  • 85
  • 135
Richard Zhang
  • 7,523
  • 1
  • 7
  • 13