What is the best way to fill AVFrame.data

Question

I want to transfer opengl framebuffer data to AVCodec as fast as possible.

I've already converted RGB to YUV with shader and read it with glReadPixels

I still need to fill AVFrame data manually. Is there any better way?

AVFrame *frame;
// Y
frame->data[0][y*frame->linesize[0]+x] = data[i*3];
// U
frame->data[1][y*frame->linesize[1]+x] = data[i*3+1];
// V
frame->data[2][y*frame->linesize[2]+x] = data[i*3+2];

I think it should be possible to read back the data into a PBO and map that to client adress space, and to reference that memory direclty in the AVFrame structure. — derhass, Sep 21 '15 at 18:23
You can already organize the data as needed in the shaders. It might seem a bit unintuitive at first, but it is definitively possible. — derhass, Sep 21 '15 at 18:33

Wagner Patriota · Accepted Answer · 2015-09-21T18:49:30.847

0

You can use sws_scale.

In fact, you don't need shaders for converting RGB->YUV. Believe me, it's not gonna have a very different performance.

swsContext = sws_getContext(WIDTH, HEIGHT, AV_PIX_FMT_RGBA, WIDTH, HEIGHT, AV_PIX_FMT_YUV, SWS_BICUBIC, 0, 0, 0 );
sws_scale(swsContext, (const uint8_t * const *)sourcePictureRGB.data, sourcePictureRGB.linesize, 0, codecContext->height, destinyPictureYUV.data, destinyPictureYUV.linesize);

The data in destinyPictureYUV will be ready to go to the codec.

In this sample, destinyPictureYUV is the AVFrame you want to fill up. Try to setup like this:

AVFrame * frame;
AVPicture destinyPictureYUV;

avpicture_alloc(&destinyPictureYUV, codecContext->pix_fmt, newCodecContext->width, newCodecContext->height);

// THIS is what you want probably
*reinterpret_cast<AVPicture *>(frame) = destinyPictureYUV;

With this setup you CAN ALSO fill up with the data you already converted to YUV in the GPU if you desire... you can choose the way you want.

edited Sep 21 '15 at 18:49

answered Sep 21 '15 at 18:33

Wagner Patriota

5,494
26
49

not actually... `sws_scale`is highly optimized. check the last update I wrote... setup the destinyPictureYUV/frame [with a cast, they are the same thing] and then try to fill up with YUV directly. compare the performance for both methods. I think it's not gonna be so different. – Wagner Patriota Sep 21 '15 at 18:41
I believe the difference will be noticed depending on the size of you image. Remember, you have a lot of GPU cores when done on GPU against less cores on CPU. However 1 CPU core is faster than a GPU core. That's why sometimes `sws_scale` can be faster than GPU. For VERY LARGE images, maybe the GPU will make difference and be faster.. but this is just bet, I never did a precise benchmark on this. – Wagner Patriota Sep 21 '15 at 18:44
@WagnerPatriota: well, the performace of sws_scale might be quite hight, but any properly GPU-based implementation will beat it by orders of magnitude. Furthermore, one can even apply some chroma subsampling directly on the GPU, reducing the amount of memory which has to be read back to client memory. – derhass Sep 21 '15 at 18:46
@WagnerPatriota: the number of cores is quite irrelevant for this conversion. You will bebandwidth-limited both with a CPU and a GPU operation. But GPUs have an architectural advantage for such kind of operations. The memory bandwith is much higher, and the image data is organized for 2D cache locality. A modern GPU will easily process this with 200 to 300GB/s, while on a decent desktop CPU you get a tenth of that - if you are lucky. – derhass Sep 21 '15 at 18:52
@derhass, I agree... you are right. as I told in my answer "it's not gonna have a very different performance", but it will, of course and I never benchmarked this. But, I have kinda good experience with timing of some algorithms on GPU versus CPU and I am sure `sws_scale` is something really fast. `sws_scale` FOR SURE is faster for small images. My BET is that for large images, GPU will be faster. But it's just my bet, I suggest a test! better than bets :-) – Wagner Patriota Sep 21 '15 at 18:54
@derhass anyways, following the setup of AVFrame is what he needs... he may choose use YUV from GPU or convert with `sws_scale`. Maybe you are right. I am thinking here and GPU was not that best when I did my tests... – Wagner Patriota Sep 21 '15 at 19:02
@WagnerPatriota: well. The key thing here is that the original image already is on the GPU. Using the CPU for this task will virtually always be faster than a CPU -> GPU -> CPU transfer of the data, because of the overhead of the transfer (and also a bit because GPUs are optimized for high throughput, not low latency). But when only the actual processing time is considered, I bet that the GPU will be almost always faster, unless the image is unrealistically tiny (like < 16x16 pixels) – derhass Sep 21 '15 at 19:03
Any example for gpu? `sws_scale` take high cpu and slow with big `AVFrame` (1080x2400). Thank – Trương Quốc Khánh Jul 03 '21 at 18:37

What is the best way to fill AVFrame.data

1 Answers1