6

I have 3D textures of colors, normals and other data of my voxelized scene and because some of this data can't be just averaged i need to calculate mip levels by my own. The 3D texture sizes are (128+64) x 128 x 128, the additional 64 x 128 x 128 are for mip levels.

So when i take the first mip level, which is at (0, 0, 0) with a size of 128 x 128 x 128 and just copy voxels to the second level, which is at (128, 0, 0) the data appears there, but as soon as i copy the second level at (128, 0, 0) to the third at (128, 0, 64) the data doesn't appear at the 3rd level.

shader code:

#version 450 core

layout (local_size_x = 1,
        local_size_y = 1,
        local_size_z = 1) in;

layout (location = 0) uniform unsigned int resolution;
layout (binding = 0, rgba32f) uniform image3D voxel_texture;

void main()
{
    ivec3 index = ivec3(gl_WorkGroupID);
    ivec3 spread_index = index * 2;

    vec4 voxel = imageLoad(voxel_texture, spread_index);
    imageStore(voxel_texture, index + ivec3(resolution, 0, 0), voxel);

    // This isn't working
    voxel = imageLoad(voxel_texture, spread_index + 
                      ivec3(resolution, 0, 0));
    imageStore(voxel_texture, index + ivec3(resolution, 0, 64), voxel);
}

The shader program is dispatched with

glUniform1ui(0, OCTREE_RES);

glBindImageTexture(0, voxel_textures[0], 0, GL_TRUE, 0, GL_READ_WRITE, 
                   GL_RGBA32F);

glDispatchCompute(64, 64, 64);

I don't know if i missed some basic thing, this is my first compute shader. I also tried to use memory barriers but it didn't change a thing.

FamZ
  • 461
  • 3
  • 13

1 Answers1

3

Well you can't expect your second imageLoad to read texels that you just wrote in your first store like that.

And there is no way to synchronize access outside of the 'local' workgroup.

You'll need either :

  • To use multiple invocation of your kernel to do each layer
  • To rewrite your shader logic so that you always fetch from the 'original' zone.
246tNt
  • 2,122
  • 1
  • 16
  • 20
  • Right, i thought barriers work globally... Do you know if it would be less efficient to use one single "global" workgroup with 128x128x128 local workgroups then your 2 options? – FamZ May 31 '16 at 17:53
  • 2
    @FamZ: You assume that a 128x128x128 workgroup size would be possible. The [largest number of work items in a group that is supported by any hardware is 1536](http://opengl.gpuinfo.org/gl_stats_caps_single.php?listreportsbycap=GL_MAX_COMPUTE_WORK_GROUP_SIZE). That's much like than 128^3. What you want simply isn't possible. It's better to restructure your algorithm, and possibly data, to fit into a reasonable work group size. – Nicol Bolas May 31 '16 at 19:03