I've run into a bit of an issue writing a fragment shader for a project. I'm creating a palette-less terminal emulator, so I figure I'd do this with the following shader:
#version 110
uniform sampler2D tileset;
uniform sampler2D indices;
uniform sampler2D colors;
uniform sampler2D bgcolors;
uniform vec2 tileset_size;
uniform vec2 size;
varying vec2 tex_coord;
void main(void)
{
// Calculated texture coordinate
vec2 screen_pos = vec2(gl_FragCoord.x / 800.0, 1.0 - gl_FragCoord.y / 500.0);
// Indirect texture lookup 1
vec2 index = texture2D(indices, screen_pos.st).rg;
vec4 color = texture2D(colors, screen_pos.st);
vec4 bgcolor = texture2D(bgcolors, screen_pos.st);
// Calculated texture coordinate
vec2 tileCoord;
//256.0 because the [0,256) byte value is normalized on [0,1)
tileCoord.x = mod(screen_pos.x, 1.0/size.x)*(size.x/tileset_size.x) + floor(index.x*256.0)/tileset_size.x;
tileCoord.y = mod(screen_pos.y, 1.0/size.y)*(size.y/tileset_size.y) + floor(index.y*256.0)/tileset_size.y;
// Indirect texture lookup 2
vec4 tile = texture2D(tileset, tileCoord);
vec4 final = tile*color;
gl_FragColor = vec4(mix(bgcolor.rgb, final.rgb, final.a), 1.0);
}
To render this to the screen, I draw one big quad and let the shader do the rest.
This code generates the desired output. However, it does so at 5 seconds per frame. From what I've researched, this is likely due to the display driver executing my shader in software, rather than hardware. I found that by uncommenting texture2D()
calls, things ran smoothly again.
This led me to the following code:
void main(void)
{
//vec2 screen_pos = vec2(gl_FragCoord.x / 800.0, 1.0 - gl_FragCoord.y / 500.0);
vec2 screen_pos = vec2(0.5, 0.5);
vec2 index = texture2D(indices, screen_pos.st).rg;
vec4 color = texture2D(colors, screen_pos.st);
vec4 bgcolor = texture2D(bgcolors, screen_pos.st);
vec4 tiles = texture2D(tileset, screen_pos.st);
gl_FragColor = vec4(index.rgg + color.rgb + bgcolor.rgb + tiles.rgb, 1.0);
}
Which turned out to be just as awfully slow. Commenting out the last line, vec4 tiles = ...
, and removing it from the output ran smoothly again. So I looked up the number of texture2D calls my device supported. I got the following results:
GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB: 8
GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS_ARB: 16
GL_MAX_TEXTURE_IMAGE_UNITS_ARB: 8
GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB: 8
So something must be up. Even if each of my calls were indirect accesses (which I'm pretty sure they're not), I should have up to 8 of them! Additionally, glGetShaderInfoLog()
and glGetProgramInfoLog()
have nothing to say.
I should list my specs:
- Machine: Intel Atom Duo running Linux 3.17.1 (Arch, specifically)
- GPU: Intel 945GM/GMS/GME, 943/940GML Integrated Graphics Controller Mesa
- Version: 10.4.5
And yes, I am checking GL_ARB_fragment_program after calling the standard glewInit()
procedure.
So, I have two possible solutions in mind.
- The spec sheet for ARB_fragment_shader states that the minimum number of texture indirections should be 4. It could be that my program hasn't initialized the ARB_fragment_program correctly, and the system is falling back to the default. (I tried putting "ARB" in as many shader-related places as I could, but I think glewInit() takes care of this anyway.)
- Mesa's compiler has a bug with my particular chip. The final post here mentioned this, and has a similar sounding GPU. Basically, the compiler falsely labels all texture reads as indirect texture reads, thereby rejecting the program incorrectly.
If anyone has any incredible knowledge in this area, I'd really like to hear it. Normally I'd say "screw it, get a better computer," but the sheer irony of having a high-end graphics card just to run a terminal emulator is.. well.. ironic.
If I've forgotten to write some information here, let me know.
Edits
glxinfo -l: pastebin
ARB assembly (partially generated by cgc)
Disabling any of the TEX instructions puts it in hardware mode, all 4 will return to software.