1

I am playing some tricks on IOS to try to build a CPU-GPU-hybrid JPEG encoder. From my tests with CPU, I believe using GPU to do the DCT and quantization steps makes good sense and should boost the over performance significantly (compressing a huge number of JPEGs is the bottle neck in my app). With transform feedback, this should be doable, as I have used that to get great results in GPGPU computing. The tricky part is how to get the data (unsigned int8's of RGBA) in efficiently.

As mentioned, I used to use openGL ES 3.0 to do GPGPU computing, so I only have experience with float-point textures, which is set-up by

glTexImage2D(GL_TEXTURE_2D,0,GL_RGBA32F,WIDTH,HEIGHT,0,GL_RGBA,GL_GLOAT,data);

and delivered to the shaders by

texelFetch()

But now my input data is stored as an array of unsigned bytes (or uint8) and I need to sequentially fetch 64 of them each time. I think I can either fetch them as a texture of unsigned bytes, or more efficiently, as a texture of unsigned integers then separate them with bit shifts.

My question is, how do I actually do either of them? More specifically, how should I set the internalFormat, format and type for glTexImage2D()? I tried a lot of combinations but all of them delivers only 0 in the shaders (and I double-checked the data source that they are none-zero).

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Jason M
  • 411
  • 5
  • 19
  • For a normal RGBA you already have 4 bytes per pixel. The glTexImage2D() can be used to fetch byte by byte as each of the vec4 component represents one of this byte. The values are normalized so you need to multiply them by 255 to get an integer value representing a byte. So you may get 64 bytes by making 16 (64/4) calls to glTexImage2D(). Or am I missing something here? – Matic Oblak Apr 08 '16 at 11:51
  • Sorry, not the glTexImage2D(). I mean the texel fetch in shader. – Matic Oblak Apr 08 '16 at 12:04
  • Yes, that was my plan-B. The problem I have with this one is that I do not know how to fetch textures stored as unsigned bytes (or how to configure glTexImage2D for that). I would love to jump to plan-A if possible, in which I fetch a integer (of four bytes) instead of a single byte as a "pixel", which would reduce the number of fetching to 64/4/4=4. – Jason M Apr 08 '16 at 12:41
  • I am not sure you will be able to configure the texture to store more then 32bits per pixel (of less actually) so it might actually be impossible to achieve a plan-A. But if you do find something solid on this I would be glad to hear about it... Maybe some extension that is generally supported or something. And B-Plan: GL_RGBA is used to configure for storing 4 unsigned bytes per pixels. As mentioned above you should then scale the values by multiplying them with 255. – Matic Oblak Apr 08 '16 at 12:46

1 Answers1

1

In ES 3, seriously consider creating a pixel unpack buffer and mapping it in order to get a location to which to formulate your pixel data. That will at least save a driver-internal memcpy and can be used significantly to decrease synchronisation. See GL_PIXEL_UNPACK_BUFFER on glBindBuffer and gl[Un]MapBuffer[Range]; you'll end up with a glTexImage2D(..., (void *)0); to specify the pixel unpack buffer as a source, analogously to the way that bound buffers are specified as the source for attributes, elements, etc. See glFenceSync for synchronisation assuming you use GL_MAP_UNSYNCHRONIZED_BIT and thereby intend to handle synchronisation yourself.

For full-integer RGBA (no scaling) use GL_RGBA8UI as the internal format, GL_RGBA_INTEGER as the format, GL_UNSIGNED_BYTE as the type; then declare a usampler2d ('u' for unsigned, implicitly integer) and use a standard texture(sampler, coordinate) to sample.

You'll also want GL_CLAMP_TO_EDGE and GL_NEAREST texture parameters.

EDIT: also potentially worth mentioning, the values coming from a usampler2d are of type uvec4, so they're integral. Unlike ES 2, ES 3 has true integers, including bitwise operators — ES 2 permits them to be emulated by floats (for those of us from the '90s, this truly is an unexpected future). So, a simplified and sufficiently trivial to be worth mentioning snippet from a recent emulation project of mine:

vec4 rgb_sample(usampler2D sampler, vec2 coordinate)
{
    uint texValue = texture(sampler, coordinate).r;
    return vec4(texValue & 4u, texValue & 2u, texValue & 1u, 1.0);
}

Which, of course, is unpacking a TTL-style RGB-in-one-byte single-channel texture to a format suitable for gl_FragColor (relying upon saturation).

Tommy
  • 99,986
  • 12
  • 185
  • 204
  • Thanks for the tons of details! It makes good sense to me and I will update how it worked out after trying out. :) – Jason M Apr 09 '16 at 15:14