glReadPixels() burns up all CPU cycles of a single core

Question

I have an SDL2 app with an OpenGL window, and it is well behaved: When it runs, the app gets synchronized with my 60Hz display, and I see 12% CPU Usage for the app.

So far so good. But when I add 3D picking by reading a single (!) depth value from the depth buffer (after drawing), the following happens:

FPS still at 60
CPU usage for the main thread goes to 100%

If I don't do the glReadPixels, the CPU use drops back to 12% again. Why does reading a single value from the depth buffer cause the CPU to burn all cycles?

My window is created with:

SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 2);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);

SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );
SDL_GL_SetAttribute( SDL_GL_MULTISAMPLEBUFFERS, use_aa ? 1 : 0 );
SDL_GL_SetAttribute( SDL_GL_MULTISAMPLESAMPLES, use_aa ? 4 : 0 );
SDL_GL_SetAttribute(SDL_GL_FRAMEBUFFER_SRGB_CAPABLE, 1);
SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24);

window = SDL_CreateWindow
(
            "Fragger",
            SDL_WINDOWPOS_UNDEFINED,
            SDL_WINDOWPOS_UNDEFINED,
            fbw, fbh,
            SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE | SDL_WINDOW_ALLOW_HIGHDPI
);

My drawing is concluded with:

SDL_GL_SwapWindow( window );

My depth read is performed with:

float depth;
glReadPixels( scrx, scry, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth );

My display sync is configured using:

int rv = SDL_GL_SetSwapInterval( -1 );
if ( rv < 0 )
{
    LOGI( "Late swap tearing not available. Using hard v-sync with display." );
    rv = SDL_GL_SetSwapInterval( 1 );
    if ( rv < 0 ) LOGE( "SDL_GL_SetSwapInterval() failed." );
}
else
{
    LOGI( "Can use late vsync swap." );
}

Investigations with 'perf' shows that the cycles are burnt up by nVidia's driver, doing relentless system calls, one of which is sys_clock_gettime() as can be seen below:

I've tried some variations by reading GL_BACK or GL_FRONT, with same result. I also tried reading just before and just after the window swap. But the CPU usage is always at a 100% level.

Platform: Ubuntu 18.04.1
SDL: version 2.0.8
CPU: Intel Haswell
GPU: nVidia GTX750Ti
GL_VERSION: 3.2.0 NVIDIA 390.87

UPDATE

On Intel HD Graphics, the CPU does not spinlock. The glReadPixels is still slow, but the CPU has a low duty cycle (1%) or so, compared to a fully 100% loaded CPU on nVidia drivers.

I also tried asynchronous pixel reads via PBO (Pixel Buffer Objects) but that work only for RGBA values, never for DEPTH values.

How/when do you read your depth? From which buffer? How no-vsync'd FPS drops when you do that? Looks like you force driver to synchronise since requested depth is not yet rendered. Logically you probably want depth from previous frame, not the one you're currently rendering. — keltar, Nov 15 '18 at 12:06
@keltar thanks. I tried glReadBuffer() with both GL_FRONT and GL_BACK, with the same result. 100% CPU usage. No vsync goes to 300fps, but still a lot of cycles burnt, all in glReadPixels(). — Bram, May 03 '19 at 22:40
the nvidia driver uses busy waitin _a lot_ when you force a CPU-GPU sync. And actually, there is nothing wrong with that as it is the most performant option available. The real issue is that you're forcing a synchronization when wha you should of is of course an asynchronous readback. There is absolutely no reason for that to not work with depth buffer contents, you should really fix your code for that. — derhass, May 05 '19 at 00:27
@derhass I think my code is correct: I get non-blocking readpixels when reading RGBA, just not when reading DEPTH. Code is here: https://stackoverflow.com/questions/55994376/non-blocking-glreadpixels-of-depth-values-with-pbo — Bram, May 05 '19 at 17:05

glReadPixels() burns up all CPU cycles of a single core

0 Answers0

Linked