NVIDIA CUDA Video Encoder (NVCUVENC) input from device texture array

Question

I am modifying CUDA Video Encoder (NVCUVENC) encoding sample found in SDK samples pack so that the data comes not from external yuv files (as is done in the sample ) but from cudaArray which is filled from texture.

So the key API method that encodes the frame is:

int NVENCAPI NVEncodeFrame(NVEncoder hNVEncoder, NVVE_EncodeFrameParams *pFrmIn, unsigned long flag, void *pData);

If I get it right the param :

CUdeviceptr dptr_VideoFrame

is supposed to pass the data to encode.But I really haven't understood how to connect it with some texture data on GPU.The sample source code is very vague about it as it works with CPU yuv files input.

For example in main.cpp , lines 555 -560 there is following block:

    // If dptrVideoFrame is NULL, then we assume that frames come from system memory, otherwise it comes from GPU memory
    // VideoEncoder.cpp, EncodeFrame() will automatically copy it to GPU Device memory, if GPU device input is specified
    if (pCudaEncoder->EncodeFrame(efparams, dptrVideoFrame, cuCtxLock) == false)
    {
        printf("\nEncodeFrame() failed to encode frame\n");
    }

So ,from the comment, it seems like dptrVideoFrame should be filled with yuv data coming from device to encode the frame.But there is no place where it is explained how to do so.

UPDATE:

I would like to share some findings.First , I managed to encode data from Frame Buffer texture.The problem now is that the output video is a mess. enter image description here

That is the desired result:

enter image description here

Here is what I do :

On OpenGL side I have 2 custom FBOs-first gets the scene rendered normally into it .Then the texture from the first FBO is used to render screen quad into second FBO doing RGB -> YUV conversion in the fragment shader.

The texture attached to second FBO is mapped then to CUDA resource. Then I encode the current texture like this:

void CUDAEncoder::Encode(){
    NVVE_EncodeFrameParams      efparams;
    efparams.Height           = sEncoderParams.iOutputSize[1];
    efparams.Width            = sEncoderParams.iOutputSize[0];
    efparams.Pitch            = (sEncoderParams.nDeviceMemPitch ? sEncoderParams.nDeviceMemPitch : sEncoderParams.iOutputSize[0]);
    efparams.PictureStruc     = (NVVE_PicStruct)sEncoderParams.iPictureType;
    efparams.SurfFmt          = (NVVE_SurfaceFormat)sEncoderParams.iSurfaceFormat;
    efparams.progressiveFrame = (sEncoderParams.iSurfaceFormat == 3) ? 1 : 0;
    efparams.repeatFirstField = 0;
    efparams.topfieldfirst    = (sEncoderParams.iSurfaceFormat == 1) ? 1 : 0;


    if(_curFrame > _framesTotal){
        efparams.bLast=1;
    }else{
        efparams.bLast=0;
    }

    //----------- get cuda array from the texture resource  -------------//

    checkCudaErrorsDrv(cuGraphicsMapResources(1,&_cutexResource,NULL));
      checkCudaErrorsDrv(cuGraphicsSubResourceGetMappedArray(&_cutexArray,_cutexResource,0,0));
    /////////// copy data into dptrvideo frame //////////


    // LUMA  based on CUDA SDK sample//////////////
    CUDA_MEMCPY2D pcopy;
    memset((void *)&pcopy, 0, sizeof(pcopy));
    pcopy.srcXInBytes          = 0;
    pcopy.srcY                 = 0;
    pcopy.srcHost=            NULL;
    pcopy.srcDevice=           0;
    pcopy.srcPitch             =efparams.Width;
    pcopy.srcArray=          _cutexArray;///SOME DEVICE ARRAY!!!!!!!!!!!!! <--------- to figure out how to fill this.

    /// destination  //////
    pcopy.dstXInBytes          = 0;
    pcopy.dstY                 = 0;
    pcopy.dstHost              = 0;
    pcopy.dstArray             = 0;
    pcopy.dstDevice=dptrVideoFrame;
    pcopy.dstPitch  = sEncoderParams.nDeviceMemPitch;

    pcopy.WidthInBytes   = sEncoderParams.iInputSize[0];
    pcopy.Height         = sEncoderParams.iInputSize[1];

    pcopy.srcMemoryType=CU_MEMORYTYPE_ARRAY;
    pcopy.dstMemoryType=CU_MEMORYTYPE_DEVICE;

    // CHROMA   based on CUDA SDK sample/////

    CUDA_MEMCPY2D pcChroma;
    memset((void *)&pcChroma, 0, sizeof(pcChroma));
    pcChroma.srcXInBytes        = 0;
    pcChroma.srcY               = 0;// if I uncomment this line I get error from cuda for incorrect value.It does work in CUDA SDK original sample SAMPLE//sEncoderParams.iInputSize[1] << 1; // U/V chroma offset
    pcChroma.srcHost            = NULL;
    pcChroma.srcDevice          = 0;
    pcChroma.srcArray           = _cutexArray;
    pcChroma.srcPitch           = efparams.Width >> 1; // chroma is subsampled by 2 (but it has U/V are next to each other)

    pcChroma.dstXInBytes        = 0;
    pcChroma.dstY               = sEncoderParams.iInputSize[1] << 1; // chroma offset (srcY*srcPitch now points to the chroma planes)

    pcChroma.dstHost            = 0;
    pcChroma.dstDevice          = dptrVideoFrame;
    pcChroma.dstArray           = 0;
    pcChroma.dstPitch           = sEncoderParams.nDeviceMemPitch >> 1;

    pcChroma.WidthInBytes       = sEncoderParams.iInputSize[0] >> 1;
    pcChroma.Height             = sEncoderParams.iInputSize[1]; // U/V are sent together

    pcChroma.srcMemoryType      = CU_MEMORYTYPE_ARRAY;
    pcChroma.dstMemoryType      = CU_MEMORYTYPE_DEVICE;

    checkCudaErrorsDrv(cuvidCtxLock(cuCtxLock, 0));

    checkCudaErrorsDrv( cuMemcpy2D(&pcopy));
    checkCudaErrorsDrv( cuMemcpy2D(&pcChroma));
    checkCudaErrorsDrv(cuvidCtxUnlock(cuCtxLock, 0));
    //=============================================

    // If dptrVideoFrame is NULL, then we assume that frames come from system memory, otherwise it comes from GPU memory
    // VideoEncoder.cpp, EncodeFrame() will automatically copy it to GPU Device memory, if GPU device input is specified
    if (_encoder->EncodeFrame(efparams, dptrVideoFrame, cuCtxLock) == false)
    {
        printf("\nEncodeFrame() failed to encode frame\n");
    }
    checkCudaErrorsDrv(cuGraphicsUnmapResources(1, &_cutexResource, NULL));
    //  computeFPS();

    if(_curFrame > _framesTotal){
        _encoder->Stop();
        exit(0);
    }
    _curFrame++;

}

I set Encoder params from the .cfg files included with CUDA SDK Encoder sample.So here I use 704x480-h264.cfg setup .I tried all of them and getting always similarly ugly result.

I suspect the problem is somewhere in CUDA_MEMCPY2D for luma and chroma objects params setup .May be wrong pitch , width ,height dimensions.I set the viewport the same size as the video (704,480) and compared params to those used in CUDA SDK sample but got no clue where the problem is. Anyone ?

I don't have any experience with the video encoder, but looking at the API, it seems that you won't be able to pass a CUDA array. Pitched linear memory should work, but a texture probably won't. — talonmies, Mar 05 '13 at 11:31
But I should probably be able to get data from texture into pitched linear memory ? — Michael IV, Mar 05 '13 at 12:32
That would be the likely solution. `cudaMemcpy3D` might be useful for that, otherwise a simple kernel would also work. — talonmies, Mar 05 '13 at 12:47
I need the official confirmation it is feasible.Because most of the stuff I found on the topic deals with CPU input of YUV file.Will start a bounty .... — Michael IV, Mar 06 '13 at 17:28
If you need official conformation, then contact NVIDIA. This isn't their developer or support channel. That is https://devtalk.nvidia.com/ — talonmies, Mar 06 '13 at 17:37
Ha Ha , guess what , those forums are pretty dead....otherwise I wouldn't ask it here... — Michael IV, Mar 06 '13 at 17:46
@MichaelIV you can't have an official confirmation here, for example to sue nVidia if it didn't work :D, but some of the great people working there check this site regularly, so you can wait and hear their opinion. — Soroosh Bateni, Mar 07 '13 at 12:40
HaHa.Well,that is not my intention XD I need a little help to figure out passing data from device into the encoder :) — Michael IV, Mar 07 '13 at 15:21
@MichaelIV actually the nvidia forums aren't dead, and if you want to get official conformation from nvidia employees, that would be the place to go (njuffa, is quite active there (among others)). — alrikai, Mar 10 '13 at 23:40
Sorry to disappoint you. I have asked the same there already a couple of weeks ago. Not a single answer. CUDA forum can be considered dead as from 4 questions I asked ,got zero responses. If you have useful answer put it here, otherwise let me do my work ;) — Michael IV, Mar 11 '13 at 06:35
How do you retreive the textures from the GPU? Maybe what you need is to deal with the `NVVE_EncodeFrameParams` param instead and fill it correctly - like done for example [here](http://doubango.googlecode.com/svn-history/r653/branches/2.0/doubango/tinyDAV/src/codecs/h264/tdav_codec_h264_cuda.cxx). (Search the text for `efparams`). — rkellerm, Mar 12 '13 at 13:15
Ok , currently I do this: fill texture2D with rendered stuff then on CUDA side map it to image resource and read it into CuArray.Then I copy the array data into dptrVideoFrame.I do get it encoded but the resulting video looks sheer mess. — Michael IV, Mar 12 '13 at 14:10
I looked through your demo ,it uses the same approach as CUDA SDK sample-loading data from CPU .In my case it it trickier I suppose. — Michael IV, Mar 12 '13 at 15:09
Do you use NVSetParamValue to set NVVE_DEVICE_MEMORY_INPUT as 1? Why do you use cuGraphicsResourceGetMappedArray instead of cuGraphicsResourceGetMappedPointer? If you use cuGraphicsResourceGetMappedPointer, the function will return a device pointer that can be passed to the encoder directly. — harrism, Apr 09 '13 at 05:43
Yes I do set NVVE_DEVICE_MEMORY_INPUT to 1.For your second question:I do map to Array because I map to OpenGL FBO texture.Will it also be ok with Mapped Pointer?Though I guess the final problem is with pixels layout that comes from OpenGL... — Michael IV, Apr 09 '13 at 06:56

thewhiteambit · Answer 1 · 2013-04-17T12:57:39.177

First: I messed around with Cuda Video Encoder, and had lots of troubles to. But it Looks to me as if you convert it to Yuv values, but as a one on one Pixel conversion (like AYUV 4:4:4). Afaik you need the correct kind of YUV with padding and compression (color values for more than one Pixel like 4:2:0). A good overview of YUV-alignments can be seen here:

http://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx

As far as I remember you have to use NV12 alignment for Cuda Encoder.

score 1 · Answer 2 · answered Feb 23 '15 at 09:36

nvEncoder application is used for codec conversion,for processing over GPU its used cuda and communicating with hardware it use API of nvEncoder. in this application logic is read yuv data in input buffer and store that content in memory and then start encoding the frames. and parallel write the encoding frame in to output file.

Handling of input buffer is available in nvRead function and it is available in nvFileIO.h

any other help required leave a message here...

NVIDIA CUDA Video Encoder (NVCUVENC) input from device texture array

2 Answers2