3

Is it that the memoryBarrier in GLSL orders memory transactions within a single shader invocation and that the glMemoryBarrier in OpenGL API orders memory transactions across mulitple shader invocations (which are not necessarily of the same program).

viktorzeid
  • 1,521
  • 1
  • 22
  • 35

1 Answers1

3

Is it that the memoryBarrier in GLSL orders memory transactions within a single shader invocation and that the glMemoryBarrier in OpenGL API orders memory transactions across mulitple shader invocations (which are not necessarily of the same program).

Not exactly. You should start by making clear what a shader invocation is: It is the execution for the shader code for a single input entitty is processed. So there is a vertex shader invocation for every vertex of a draw call you make, and a fragment shader at least once for every fragment produced by the rasterization (it will be more than one in certain types of multisampling). Shader invocations of different draw calls (with possibly differerent) are of course also different invocations. But typically, when one speaks of "multiple invocations", one means of the same shader, during the same draw call (which are all potentially executed in parallel).

The GLSL spec (version 4.40) (section 8.17) has this to say about memory barriers:

Shaders of all types may read and write the contents of textures and buffer objects using image variables. While the order of reads and writes within a single shader invocation is well-defined, the relative order of reads and writes to a single shared memory address from multiple separate shader invocations is largely undefined. The order of memory accesses performed by one shader invocation, as observed by other shader invocations, is also largely undefined but can be controlled through memory control functions.

So, this might or might not be what you meant with your above statement, depending on what you meant with "single invocation"., but only if you interpreted that as "all invocations of a single draw call".

This is from the OpenGL 4.4 core profile specification, section 7.12.2

Explicit synchronization is required to ensure that the effects of buffer and texture data stores performed by shaders will be visible to subsequent operations using the same objects and will not overwrite data still to be read by previously requested operations. Without manual synchronization, shader stores for a “new” primitive may complete before processing of an “old” primitive completes. Additionally, stores for an “old” primitive might not be completed before processing of a “new” primitive starts.

So this is also not all about shader invocations of a follwoing draw call. It does not even need new shader invocations at all: If a following GL command uses or overwirtes data where your shader has written to, you have to manually synchronize this. Note that this only is relevant if your shader writes to buffers or textures, it will not be relevant for the "ordinary" framebuffer writes through the pipeline:

The relative order of invocations of the same shader type are undefined. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are written to the framebuffer in primitive order, stores executed by fragment shader invocations are not.

I recommend you to read the whole section 7.12, which is way too long to paste here, but is crucial for the understanding of the GL's memory barrier functions.

derhass
  • 43,833
  • 2
  • 57
  • 78
  • what is not clear is why is there an existence of a CPU side glapi function glMemoryBarrier ? it makes sense to have GLSL functions, because in a gpu shader we are working in the middle of the execution of the pipeline. But CPU functions need to be explained. since CPU functions are appended into command lists, they make no sense into controlling GPU memory ordering, since its issue is not synchronized with shader execution. – v.oddou Aug 27 '14 at 08:13
  • 1
    @v.oddou: It is not about the CPU per se. It is about following _GL_ commands. When you have a shader which for example writes to a image via image store functionality and you issue a render command which is reading from that resource (i.e. as a texture), or you simply try to read back the image data, you must manually synchronize this, as the GL will not automatically do this for you (it will only do that for the default framebuffer effects of drawing commands, but not for anything "besides" the main pipeline). – derhass Aug 27 '14 at 11:53
  • I see. this makes sense. But it remains kind of disturbing, because of the discrepancy between "normal" pipelining and the rest. Naturally, one would think all side effects are well finished between two commands. (since they are high level API commands). But its ok, I guess, its imaginable that pipelined command may start before the previous's side effects are finished, since the GPU pipeline is so long. – v.oddou Aug 27 '14 at 13:04