The default framebuffer only retains the topmost color. To get what you're suggesting would require a specific rendering pipeline.
For instance you could:
- Create an offscreen framebuffer of the same dimensions as your target viewport
- Render a depth-only pass to the offscreen framebuffer, storing the depth values in an attached texture
- Re-render the scene with a special shader that only drew pixels where the post-transformation Z values was LESS than the value in the previously recorded depth buffer
The final result of the last render should be the original scene with the top layer stripped off.
Edit:
It would require only a small amount of new code to create the offscreen framebuffer and render a depth only version of the scene to it, and you could use your existing rendering pipeline in combination with that to execute steps 1 and 2.
However, I can't think of any way you could then re-render the scene to get the information you want in step 3 without a shader, because it both the standard depth test plus a test against the provided depth texture. That doesn't mean there isn't one, just that I'm not well versed in GL tricks to think of it.
I can think of other ways of trying to accomplish the same task for specific points on the screen by fiddling with the rendering system, but they're all far more convoluted than just writing a shader.