3

I wrote some code, too long to paste here, that renders into a 3D 1 component float texture via a fragment shader that uses bindless imageLoad and imageStore.

That code is definitely working.

I then needed to work around some GLSL compiler bugs, so wanted to read the 3D texture above back to the host via glGetTexImage. Yes, I did do a glMemoryBarrierEXT(GL_ALL_BARRIER_BITS). I did check the texture info via glGetTexLevelparameteriv() and everything I see matches. I did check for OpenGL errors, and have none.

Sadly, though, glGetTexImage never seems to read what was written by the fragment shader. Instead, it only returns the fake values I put in when I called glTexImage3D() to create the texture.

Is that expected behavior? The documentation implies otherwise.

If glGetTexImage actually works that way, how can I read back the data in that 3D texture (resident on the device?) Clearly the driver can do that as it does when the texture is made non-resident. Surely there's a simple way to do this simple thing...


I was asking if glGetTexImage was supposed to work that way or not. Here's the code:

void Bindless3DArray::dump_array(Array3D<float> &out)
{  
bool was_mapped = m_image_mapped;
if (was_mapped)
    unmap_array();          // unmap array so it's accessible to opengl

out.resize(m_depth, m_height, m_width);

glBindTexture(GL_TEXTURE_3D, m_textureid);  // from glGenTextures()

#if 0
int w,h,d;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_WIDTH, &w);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_HEIGHT, &h);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_DEPTH, &d);
int internal_format;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_INTERNAL_FORMAT, &internal_format);
int data_type_r, data_type_g;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_RED_TYPE, &data_type_r);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_GREEN_TYPE, &data_type_g);
int size_r, size_g;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_RED_SIZE, &size_r);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_GREEN_SIZE, &size_g);
#endif

glGetTexImage(GL_TEXTURE_3D, 0, GL_RED, GL_FLOAT, &out(0,0,0));
glBindTexture(GL_TEXTURE_3D, 0);
CHECK_GLERROR();

if (was_mapped)
    map_array_to_cuda();    // restore state
}

Here's the code that creates the bindless array:

void Bindless3DArray::allocate(int w, int h, int d, ElementType t)
{
if (!m_textureid)
    glGenTextures(1, &m_textureid);
m_type = t;
m_width = w;
m_height = h;
m_depth = d;

glBindTexture(GL_TEXTURE_3D, m_textureid);
CHECK_GLERROR();
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAX_LEVEL, 0);    // ensure only 1 miplevel is allocated
CHECK_GLERROR();

Array3D<float> foo(d, h, w);
// DEBUG -- glGetTexImage returns THIS data, not what's on device
for (int z=0; z<m_depth; ++z)
for (int y=0; y<m_height; ++y)
for (int x=0; x<m_width; ++x)
    foo(z,y,x) = 3.14159;

//-- Texture creation
if (t == ElementInteger)
    glTexImage3D(GL_TEXTURE_3D, 0, GL_R32UI, w, h, d, 0, GL_RED_INTEGER, GL_INT, 0);
else if (t == ElementFloat)
    glTexImage3D(GL_TEXTURE_3D, 0, GL_R32F,  w, h, d, 0, GL_RED, GL_FLOAT, &foo(0,0,0));
else
    throw "Invalid type for Bindless3DArray";
CHECK_GLERROR();

m_handle = glGetImageHandleNV(m_textureid, 0, true, 0, (t == ElementInteger) ? GL_R32UI : GL_R32F);
glMakeImageHandleResidentNV(m_handle, GL_READ_WRITE);
CHECK_GLERROR();

#ifdef USE_CUDA
checkCuda(cudaGraphicsGLRegisterImage(&m_image_resource, m_textureid, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));
#endif
}

I allocate the array, render to it via an OpenGL fragment program, and then I call dump_array() to read the data back. Sadly, I only get what I loaded in the allocate call.

The render program looks like

void App::clear_deepz()
{
deepz_clear_program.bind();

deepz_clear_program.setUniformValue("sentinel", SENTINEL);
deepz_clear_program.setUniformValue("deepz", deepz_array.handle());
deepz_clear_program.setUniformValue("sem", semaphore_array.handle());

run_program();

glMemoryBarrierEXT(GL_ALL_BARRIER_BITS);
//  glMemoryBarrierEXT(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
//  glMemoryBarrierEXT(GL_SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV);

deepz_clear_program.release();
}

and the fragment program is:

#version 420\n

in vec4 gl_FragCoord;
uniform float sentinel;
coherent uniform layout(size1x32) image3D deepz;
coherent uniform layout(size1x32) uimage3D sem;

void main(void)
{
ivec3 coords = ivec3(gl_FragCoord.x, gl_FragCoord.y, 0);
imageStore(deepz, coords, vec4(sentinel));
imageStore(sem, coords, ivec4(0));
discard;    // don't write to FBO at all
}
genpfault
  • 51,148
  • 11
  • 85
  • 139
Walt Donovan
  • 357
  • 1
  • 10

2 Answers2

2
discard;    // don't write to FBO at all

That's not what discard means. Oh, it does mean that. But it also means that all Image Load/Store writes will be discarded too. Indeed, odds are, the compiler will see that statement and just do nothing for the entire fragment shader.

If you want to just execute the fragment shader, you can employ the GL 4.3 feature (available on your NVIDIA hardware) of having an empty framebuffer object. Or you could use a compute shader. If you can't use GL 4.3 yet, then use a write mask to turn off all color writes.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Interesting you say that because I'm certain that the imageStore actually stored stuff. Regardless, I'll try what you suggest and file a bug with nvidia if I have to. – Walt Donovan Jun 06 '13 at 23:31
  • @WaltDonovan: It's not a bug. The OpenGL specification *requires this*. If you issue a `discard`, then every visible action the fragment shader does *must* be thrown away. – Nicol Bolas Jun 06 '13 at 23:47
  • Lovely -- I found this single reference deep in the GL_EXT_shader_image_load_store spec: (20) What happens if a shader specifies an image store or atomic operation for killed/discarded pixels? RESOLVED: No stores occur when this happens. So the fact that it seemed to work is actually yet another bug in the driver. – Walt Donovan Jun 06 '13 at 23:53
  • Crap. Using an empty FBO didn't fix the issue (and in fact gave identical results.) I commented out all of the discards. Any ideas now? – Walt Donovan Jun 07 '13 at 00:38
  • @WaltDonovan: What happens when you don't use NVIDIA's bindless stuff? – Nicol Bolas Jun 07 '13 at 02:28
  • Well that'd be attempt #7 to get this code to work... I much prefer the bindless approach as it's closer to c/c++. Hopefully I'll hear from nvidia shortly about the bug reports I've sent them, and I'll read up again on the old way of doing things before bindless. – Walt Donovan Jun 07 '13 at 03:18
1

As Nicol mentions above, if you want side effects only of image load and store, the proper way is to use an empty frame buffer object.

The bug of mixing glGetTexImage() and bindless textures was in fact a driver bug, and has been fixed as of driver version 335.23. I filed the bug and have confirmed my code is now working properly.

Note I am using empty frame buffer objects in the code, and don't use "discard" any more.

Walt Donovan
  • 357
  • 1
  • 10