I am working on a system that sends a compressed video to a client from 3d graphics that are done in the server as soon as they are rendered. I already have the code working, but I feel it could be much faster (and it is already a bottleneck in the system)
Here is what I am doing:
First I grab the framebuffer
glReadBuffer( GL_FRONT );
glReadPixels( 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, buffer );
Then I flip the framebuffer, because there is a weird bug with swsScale (which I am using for colorspace conversion) that flips the image vertically when I convert. I am flipping in advance, nothing fancy.
void VerticalFlip(int width, int height, byte* pixelData, int bitsPerPixel)
{
byte* temp = new byte[width*bitsPerPixel];
height--; //remember height array ends at height-1
for (int y = 0; y < (height+1)/2; y++)
{
memcpy(temp,&pixelData[y*width*bitsPerPixel],width*bitsPerPixel);
memcpy(&pixelData[y*width*bitsPerPixel],&pixelData[(height-y)*width*bitsPerPixel],width*bitsPerPixel);
memcpy(&pixelData[(height-y)*width*bitsPerPixel],temp,width*bitsPerPixel);
}
delete[] temp;
}
Then I convert it to YUV420p
convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, width, height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
uint8_t *src[3]= {buffer, NULL, NULL};
sws_scale(convertCtx, src, &srcstride, 0, height, pic_in.img.plane, pic_in.img.i_stride);
Then I pretty much just call the x264 encoder. I am already using the zerolatency preset.
int frame_size = x264_encoder_encode(_encoder, &nals, &i_nals, _inputPicture, &pic_out);
My guess is that there should be a faster way to do this. Capturing the frame and converting it to YUV420p. It would be nice to convert it to YUV420p in the GPU and only after that copying it to system memory, and hopefully there is a way to do color conversion without the need to flip.
If there is no better way, at least this question may help someone trying to do this, to do it the same way I did.