2

my goal is to encode the main framebuffer of my Windows machine using nvenc and stream its content to my iPad using the VideoToolbox API

The code I use to encode the h264 stream is basically a copy/paste of https://github.com/NVIDIA/video-sdk-samples/tree/master/nvEncDXGIOutputDuplicationSample the only change is that instead of writing to a file, I do send the data

For the decoding I do use https://github.com/zerdzhong/SwfitH264Demo/blob/master/SwiftH264/ViewController.swift#L71

The encoding work perfectly when I write all the contents to a file, I am able to use a h264->mp4 online converter without issue, the problem is that the decoder gives me the error kVTVideoDecoderBadDataErr in the function decompressionSessionDecodeFrameCallback

So for what I tried:

  • Firsly using an h264 analyzer I found that the frame order are: 7/8/5/5/5/5/1...
  • I found that nvenc does encode the frames 7/8/5/5/5/5 in only one packet
  • I did try to separate this packet into multiple ones using the sequence (0x00 0x00 0x00 0x01), it gave me the frames 7/8/5 separately
  • As you can see I only got one 5 frame which is around 100KB, the H264 analyzer said that there are four 5 frames (which are something like 40KB, 20KB, 30KB, 10KB)
  • Using a hex file viewer I saw that the sequence separating those 5 frames were (0x00 0x00 0x01) instead, tried to also separate them but I got the exact same VideoToolbox error while decompressing

here is the code I use to separate and send the frames: The protocol is simply PACKET_SIZE->PACKET_DATA The swift code is able to read the NALU types so I am confident that this is not the issue

    unsafe {
        Setup();
        loop {
            CaptureFrame();

            let frame_count = GetDataCount();
            if frame_count == 0 {
                continue;
            }

            for i in 0..frame_count {
                let size = RetrieveDataSize(i as i32);
                let size_slice = &(u32::to_le_bytes(size as u32));

                let data = RetrieveData(i as i32);
                let data_slice = std::slice::from_raw_parts(data, size);

                let mut last_frame = 0;

                for x in 0..size {
                    if data_slice[x] == 0 &&
                        data_slice[x + 1] == 0 &&
                        data_slice[x + 2] == 0 &&
                        data_slice[x + 3] == 1 {
                        let frame_size = x - last_frame;
                        if frame_size > 0 {
                            let frame_data = &data_slice[last_frame..x];
                            stream.write(&(u32::to_le_bytes(frame_size as u32))).unwrap();
                            stream.write(frame_data).unwrap();
                            println!("SEND MULTIPLE {}", frame_size);
                        }

                        last_frame = x;
                        println!("NALU {}", data_slice[x + 4] & 0x1F);
                        //println!("TEST {} {}",i, size);
                        continue;
                    }
                }
                // Packet was a single frame
                let frame_size = size - last_frame;
                let frame_data = &data_slice[last_frame..size];
                stream.write(&(u32::to_le_bytes(frame_size as u32))).unwrap();
                stream.write(frame_data).unwrap();
                println!("SEND SINGLE {} {}", last_frame, size);
            }
        }
    }

It could be concerning the texture format, VideoToolbox makes mentioning of kCVPixelFormatType_420YpCbCr8BiPlanarFullRange, and the NVENC codes mentions YUV420 and NV12, I am unsure if both are the same or not

Here is my format description:

Optional(<CMVideoFormatDescription 0x2823dd410 [0x1e0921e20]> {
    mediaType:'vide' 
    mediaSubType:'avc1' 
    mediaSpecific: {
        codecType: 'avc1'       dimensions: 3840 x 2160 
    } 
    extensions: {{
    CVFieldCount = 1;
    CVImageBufferChromaLocationBottomField = Left;
    CVImageBufferChromaLocationTopField = Left;
    CVPixelAspectRatio =     {
        HorizontalSpacing = 1;
        VerticalSpacing = 1;
    };
    FullRangeVideo = 0;
    SampleDescriptionExtensionAtoms =     {
        avcC = {length = 41, bytes = 0x01640033 ffe10016 67640033 ac2b401e ... 68ee3cb0 fdf8f800 };
    };
}}
})
TheMode
  • 21
  • 1
  • 1
  • 5
  • 1
    You are not handling 3 byte start codes. – szatmary Jun 22 '20 at 21:25
  • Alright, so I did add the 3 bytes handling for the IDR frames, problem is that the swift code does assume that the start codes is always 4 bytes, changing the nalUnitLength parameter to 3 in CMVideoFormatDescriptionCreateFromH264ParameterSets gives me another error – TheMode Jun 22 '20 at 23:20
  • 1
    Its not a transform that can be done in place. If you get a 3 byte start code in, you must write a 4 byte size out. – szatmary Jun 23 '20 at 02:14
  • Thanks for the clarification. So I did add the handling for the 3 bytes codes and added a debug print: https://hastebin.com/upurofuzic.js the prints are at the end and show which frames are 3 bytes (true) or not. The swift client is able to read all of those since I added a single 0x00 byte at the start of the 3b codes packet, error is still the same and throws at every packet starting the first Nalu5 frame – TheMode Jun 23 '20 at 13:12

1 Answers1

0

Alright so as weird as it sounds, my code does work on the simulator but not on my iPad pro. In the end it does work so I'll still mark it as the correct answer

TheMode
  • 21
  • 1
  • 1
  • 5
  • 1
    Testing video decoding and encoding not relevant on the simulator, because it's uses mac resources. – vpoltave Aug 02 '20 at 07:20
  • What does it mean? That my code is faulty even if the simulator doesn't give me any error code? – TheMode Aug 02 '20 at 22:20
  • Yes, testing decoding/encoding on real devices is the only way to tell, that everything is working correct. – vpoltave Aug 03 '20 at 06:39