Faster encoding of realtime 3d graphics with opengl and x264

Question

I am working on a system that sends a compressed video to a client from 3d graphics that are done in the server as soon as they are rendered. I already have the code working, but I feel it could be much faster (and it is already a bottleneck in the system)

Here is what I am doing:

First I grab the framebuffer

glReadBuffer( GL_FRONT );
glReadPixels( 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, buffer );

Then I flip the framebuffer, because there is a weird bug with swsScale (which I am using for colorspace conversion) that flips the image vertically when I convert. I am flipping in advance, nothing fancy.

void VerticalFlip(int width, int height, byte* pixelData, int bitsPerPixel)
{
byte* temp = new byte[width*bitsPerPixel];
height--; //remember height array ends at height-1


for (int y = 0; y < (height+1)/2; y++) 
{
    memcpy(temp,&pixelData[y*width*bitsPerPixel],width*bitsPerPixel);
    memcpy(&pixelData[y*width*bitsPerPixel],&pixelData[(height-y)*width*bitsPerPixel],width*bitsPerPixel);
    memcpy(&pixelData[(height-y)*width*bitsPerPixel],temp,width*bitsPerPixel);
}
delete[] temp;
}

Then I convert it to YUV420p

convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, width, height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
uint8_t *src[3]= {buffer, NULL, NULL}; 

sws_scale(convertCtx, src, &srcstride, 0, height, pic_in.img.plane, pic_in.img.i_stride);

Then I pretty much just call the x264 encoder. I am already using the zerolatency preset.

int frame_size = x264_encoder_encode(_encoder, &nals, &i_nals, _inputPicture, &pic_out);

My guess is that there should be a faster way to do this. Capturing the frame and converting it to YUV420p. It would be nice to convert it to YUV420p in the GPU and only after that copying it to system memory, and hopefully there is a way to do color conversion without the need to flip.

If there is no better way, at least this question may help someone trying to do this, to do it the same way I did.

There are several things that can make this "faster". You can indeed offload some calculations to the GPU, but if your GPU is the bottleneck already that won't help much. You can also improve this code already by using PBO so you don't block the GPU pipeline. Faster is relative in this context. Measure first what exactly is the problem. — KillianDS, Oct 03 '12 at 17:50
Also, do you just want to do screen-scraping here or can you render things to an FBO instead? It makes a huge difference in what you can do for optimizations. I've been through this whole thing already, it can be quite tricky. — KillianDS, Oct 03 '12 at 17:55
I am sorry. copying the framebuffer to memory through readpixels, flipping and converting as a whole are the bottleneck. After copying, everything is done in main memory and CPU. I am rendering to the screen in the server and then capturing it. This is probably silly, if by using FBO I can avoid going to the display and at the same time improve performance, that would be great. I am not familiar with Frame buffer objects, but I could definitely give them a try. I guess I could modify the application so that instead of render to the back buffers it renders to an FBO. Does that make sense? — cloudraven, Oct 03 '12 at 18:21
The front buffer is possibly the slowest source you can read from. Even the backbuffer is faster, but ideally you want an actual texture (FBO). — ssube, Dec 26 '12 at 21:38
"Then I flip the framebuffer, because there is a weird bug with swsScale"... I think the image is flipped because of OpenGL's coordinate system: (0,0) being at the lower left corner of the screen. — sonofrage, Mar 27 '15 at 04:24

score 2 · Accepted Answer · edited May 23 '17 at 11:50

First , use async texture read using PBOs.Here is example It speeds ups the read by using 2 PBOs which work asynchronously without stalling the pipeline like readPixels does when used directly.In my app I got 80% performance boost when switched to PBOs. Additionally , on some GPUs glGetTexImage() works faster than glReadPixels() so try it out.

But if you really want to take the video encoding to the next level you can do it via CUDA using Nvidia Codec Library.I recently asked the same question so this can be helpful.

Faster encoding of realtime 3d graphics with opengl and x264

1 Answers1