32

I'm doing ray casting in the fragment shader. I can think of a couple ways to draw a fullscreen quad for this purpose. Either draw a quad in clip space with the projection matrix set to the identity matrix, or use the geometry shader to turn a point into a triangle strip. The former uses immediate mode, deprecated in OpenGL 3.2. The latter I use out of novelty, but it still uses immediate mode to draw a point.

May Oakes
  • 4,359
  • 5
  • 44
  • 51
  • 3
    The geometry shader to generate a quad from a point sounds like overkill if you just need a single quad. Just draw two triangles or a triangle strip. Those four vertices won't hurt you, at least not as hard as a special geometry shader for something that simple. – Christian Rau Oct 14 '11 at 18:14

6 Answers6

36

I'm going to argue that the most efficient approach will be in drawing a single "full-screen" triangle. For a triangle to cover the full screen, it needs to be bigger than the actual viewport. In NDC (and also clip space, if we set w=1), the viewport will always be the [-1,1] square. For a triangle to cover this area just completely, we need to have two sides to be twice as long as the viewport rectangle, so that the third side will cross the edge of the viewport, hence we can for example use the following coordiates (in counter-clockwise order): (-1,-1), (3,-1), (-1,3).

We also do not need to worry about the texcoords. To get the usual normalized [0,1] range across the visible viewport, we just need to make the corresponding texcoords for the vertices tiwce as big, and the barycentric interpolation will yield exactly the same results for any viewport pixel as when using a quad.

This approach can of course be combined with attribute-less rendering as suggested in demanze's answer:

out vec2 texcoords; // texcoords are in the normalized [0,1] range for the viewport-filling quad part of the triangle
void main() {
        vec2 vertices[3]=vec2[3](vec2(-1,-1), vec2(3,-1), vec2(-1, 3));
        gl_Position = vec4(vertices[gl_VertexID],0,1);
        texcoords = 0.5 * gl_Position.xy + vec2(0.5);
}

Why will a single triangle be more efficient?

This is not about the one saved vertex shader invocation, and the one less triangle to handle at the front-end. The most significant effect of using a single triangle will be that there are less fragment shader invocations

Real GPUs always invoke the fragment shader for 2x2 pixel sized blocks ("quads") as soon as a single pixel of the primitive falls into such a block. This is necessary for calculating the window-space derivative functions (those are also implicitly needed for texture sampling, see this question).

If the primitive does not cover all 4 pixels in that block, the remaining fragment shader invocations will do no useful work (apart from providing the data for the derivative calculations) and will be so-called helper invocations (which can even be queried via the gl_HelperInvocation GLSL function). See also Fabian "ryg" Giesen's blog article for more details.

If you render a quad with two triangles, both will have one edge going diagonally across the viewport, and on both triangles, you will generate a lot of useless helper invocations at the diagonal edge. The effect will be worst for a perfectly square viewport (aspect ratio 1). If you draw a single triangle, there will be no such diagonal edge (it lies outside of the viewport and won't concern the rasterizer at all), so there will be no additional helper invocations.

Wait a minute, if the triangle extends across the viewport boundaries, won't it get clipped and actually put more work on the GPU?

If you read the textbook materials about graphics pipelines (or even the GL spec), you might get that impression. But real-world GPUs use some different approaches like Guard-band clipping. I won't go into detail here (that would be a topic on it's own, have a look at Fabian "ryg" Giesen's fine blog article for details), but the general idea is that the rasterizer will produce fragments only for pixels inside the viewport (or scissor rect) anyway, no matter if the primitive lies completely inside it or not, so we can simply throw bigger triangles at it if both of the following are true:

  • a) the triangle does only extend the 2D top/bottom/left/right clipping planes (as opposed to the z-Dimension near/far ones, which are more tricky to handle, especially because vertices may also lie behind the camera)

  • b) the actual vertex coordinates (and all intermediate calculation results the rasterizer might be doing on them) are representable in the internal data formats the GPU's hardware rasterizer uses. The rasterizer will use fixed-point data types of implementation-specific width, while vertex coords are 32Bit single precision floats. (That is basically what defines the size of the Guard-band)

Our triangle is only factor 3 bigger than the viewport, so we can be very sure that there is no need to clip it at all.

But is it worth it?

Well, the savings on fragment shader invocations are real (especially when you have a complex fragment shader), but the overall effect might be barely measurable in a real-world scenario. On the other hand, the approach is not more complicated than using a full-screen quad, and uses less data, so even if might not make a huge difference, it won't hurt, so why not using it?

Could this approach be used for all sorts of axis-aligned rectangles, not just fullscreen ones?

In theory, you can combine this with the scissor test to draw some arbitrary axis-aligned rectangle (and the scissor test will be very efficient, as it just limits which fragments are produced in the first place, it isn't a real "test" in HW which discards fragments). However, this requires you to change the scissor parameters for each rectangle you want to draw, which implies a lot of state changes and limits you to a single rectangle per draw call, so doing so won't be a good idea in most scenarios.

Yun
  • 3,056
  • 6
  • 9
  • 28
derhass
  • 43,833
  • 2
  • 57
  • 78
  • 1
    I like this answer because not only because of performance; it's also easier to implement! – tuket Mar 19 '21 at 21:43
  • This answer deserves much more respect and recognition... I got a boost of around 15 fps beause of my shaders being extremely complex. Btw, in a blur method, I was also able to somewhat see the diagonal of the quad and I still don't know why but after that, when I started using a triangle, I didn't have to worry about it. Thanks alot for the time that you put onto this answer. – Mudit Bhatia Sep 24 '21 at 18:54
  • 1
    >" so why not using it". Because it obfusticates what you're trying to do. If you're drawing with some ridiculously heavy shaders then sure, but otherwise you're over optimizing and making your code less readable for the next person who has to figure out your function that's supposedly drawing a quad is only using 3 vertices and why the texture coordinate calculations don't seem to make sense. – gman Oct 25 '21 at 23:08
  • 2
    If someone stuck on draw code for this answer: `GLuint emptyVAO; glGenVertexArrays(1, &emptyVAO); glBindVertexArray(emptyVAO); glDrawArrays(GL_TRIANGLES, 0, 3);` (for empty VAO creation thanks [geenux](https://stackoverflow.com/users/904758/geenux)). – Van der Deken Jan 21 '22 at 17:40
21

You can send two triangles creating a quad, with their vertex attributes set to -1/1 respectively.

You do not need to multiply them with any matrix in the vertex/fragment shader.

Here are some code samples, simple as it is :)

Vertex Shader:

const vec2 madd=vec2(0.5,0.5);
attribute vec2 vertexIn;
varying vec2 textureCoord;
void main() {
   textureCoord = vertexIn.xy*madd+madd; // scale vertex attribute to [0-1] range
   gl_Position = vec4(vertexIn.xy,0.0,1.0);
}

Fragment Shader :

varying vec2 textureCoord;
void main() {
   vec4 color1 = texture2D(t,textureCoord);
   gl_FragColor = color1;
}
AdilYalcin
  • 1,177
  • 10
  • 16
  • 3
    By the way, this follows your former idea, yet it is definitely not deprecated in OpenGL 3.2. You can still send vertex attributes through a vertex array/buffer, etc. OpenGL is still an immediate mode rendering API. – AdilYalcin Apr 07 '10 at 10:23
16

No need to use a geometry shader, a VBO or any memory at all.

A vertex shader can generate the quad.

layout(location = 0) out vec2 uv;

void main() 
{
    float x = float(((uint(gl_VertexID) + 2u) / 3u)%2u); 
    float y = float(((uint(gl_VertexID) + 1u) / 3u)%2u); 

    gl_Position = vec4(-1.0f + x*2.0f, -1.0f+y*2.0f, 0.0f, 1.0f);
    uv = vec2(x, y);
}

Bind an empty VAO. Send a draw call for 6 vertices.

unknown
  • 368
  • 4
  • 12
14

To output a fullscreen quad geometry shader can be used:

#version 330 core

layout(points) in;
layout(triangle_strip, max_vertices = 4) out;

out vec2 texcoord;

void main() 
{
    gl_Position = vec4( 1.0, 1.0, 0.5, 1.0 );
    texcoord = vec2( 1.0, 1.0 );
    EmitVertex();

    gl_Position = vec4(-1.0, 1.0, 0.5, 1.0 );
    texcoord = vec2( 0.0, 1.0 ); 
    EmitVertex();

    gl_Position = vec4( 1.0,-1.0, 0.5, 1.0 );
    texcoord = vec2( 1.0, 0.0 ); 
    EmitVertex();

    gl_Position = vec4(-1.0,-1.0, 0.5, 1.0 );
    texcoord = vec2( 0.0, 0.0 ); 
    EmitVertex();

    EndPrimitive(); 
}

Vertex shader is just empty:

#version 330 core

void main()
{
}

To use this shader you can use dummy draw command with empty VBO:

glDrawArrays(GL_POINTS, 0, 1);
Dimitry Leonov
  • 319
  • 2
  • 4
  • 1
    Why is the clip-space Z 0.5 instead of 0.0? And why would you use this when drawing an actual quad (or as others have pointed out, a large triangle) is much easier and less expensive? – Nicol Bolas Feb 18 '12 at 17:03
  • Z coordinate can be 0.0, you are right. As for second question, it's really less expensive for CPU, and is more flexible, as one can change shader with no program rebuild. – Dimitry Leonov Feb 22 '12 at 16:09
  • How much less expensive? Is it something that would actually be noticeable and measurable? – Nicol Bolas Feb 22 '12 at 16:29
  • If you are drawing a screen aligned quad as part of deferred rendering then wouldn't you need to find where on that triangle to place the uv coords? Whereas with a quad, square tri-strip, the vertex positions and hence uv {0.0, 0.0 ... 1.0, 1.0} map onto the vertex positions. – ste3e Apr 20 '12 at 01:45
  • How do you go about creating the VBO? Everything I tried up till now didn't work :( – geenux Nov 29 '13 at 20:55
  • Nevermind, it's `GLint vao; glGenVertexArrays( 1, &vao ); glBindVertexArray( vao ); glDrawArrays( GL_POINTS, 0, 1 );` – geenux Dec 04 '13 at 15:52
7

This is similar to the answer by demanze, but I would argue it's easier to understand. Also this is only drawn with 4 vertices by using TRIANGLE_STRIP.

#version 300 es
out vec2 textureCoords;

void main() {
    const vec2 positions[4] = vec2[](
        vec2(-1, -1),
        vec2(+1, -1),
        vec2(-1, +1),
        vec2(+1, +1)
    );
    const vec2 coords[4] = vec2[](
        vec2(0, 0),
        vec2(1, 0),
        vec2(0, 1),
        vec2(1, 1)
    );

    textureCoords = coords[gl_VertexID];
    gl_Position = vec4(positions[gl_VertexID], 0.0, 1.0);
}
Magnus
  • 360
  • 3
  • 8
  • 19
2

The following comes from the draw function of the class that draws fbo textures to a screen aligned quad.

Gl.glUseProgram(shad);      

Gl.glBindBuffer(Gl.GL_ARRAY_BUFFER, vbo);           
Gl.glEnableVertexAttribArray(0);
Gl.glEnableVertexAttribArray(1);
Gl.glVertexAttribPointer(0, 3, Gl.GL_FLOAT, Gl.GL_FALSE, 0, voff);
Gl.glVertexAttribPointer(1, 2, Gl.GL_FLOAT, Gl.GL_FALSE, 0, coff);  

Gl.glActiveTexture(Gl.GL_TEXTURE0);
Gl.glBindTexture(Gl.GL_TEXTURE_2D, fboc);
Gl.glUniform1i(tileLoc, 0);

Gl.glDrawArrays(Gl.GL_QUADS, 0, 4);

Gl.glBindTexture(Gl.GL_TEXTURE_2D, 0);
Gl.glBindBuffer(Gl.GL_ARRAY_BUFFER, 0); 

Gl.glUseProgram(0); 

The actual quad itself and the coords are got from:

private float[] v=new float[]{  -1.0f, -1.0f, 0.0f,
                                1.0f, -1.0f, 0.0f,
                                1.0f, 1.0f, 0.0f,
                                -1.0f, 1.0f, 0.0f,

                                0.0f, 0.0f,
                                1.0f, 0.0f,
                                1.0f, 1.0f,
                                0.0f, 1.0f
};

The binding and set up of the vbo's I leave to you.

The vertex shader:

#version 330

layout(location = 0) in vec3 pos;
layout(location = 1) in vec2 coord;

out vec2 coords;

void main() {
    coords=coord.st;
    gl_Position=vec4(pos, 1.0);
}

Because the position is raw, that is, not multiplied by any matrix the -1, -1::1, 1 of the quad fit into the viewport. Look for Alfonse's tutorial linked off any of his posts on openGL.org.

vines
  • 5,160
  • 1
  • 27
  • 49
ste3e
  • 995
  • 1
  • 9
  • 18