3

When I use the following code:


#define MAX_RADIUS 55
#define KERNEL_SIZE (MAX_RADIUS * 2 + 1)
...
float[] kernel[KERNEL_RADIUS];
...
float4 PS_GaussianBlur(float2 texCoord : TEXCOORD) : COLOR0
{
    float4 color = float4(0.0f, 0.0f, 0.0f, 0.0f);

    //add the right side offset pixels to the color
    for (int i = 0; i < MAX_RADIUS; i++)
    {
        if(kernel[i] != 0) //this will improve performance for lower filter radius's, but increases const register num
            color += tex2D(colorMap, texCoord + offsets[i]) * kernel[i];
    }
    //add the left side offset pixels to the color
    for (int j = 0; j < MAX_RADIUS; j++)
    {
        if(kernel[i] != 0)
            color += tex2D(colorMap, texCoord - offsets[j]) * kernel[j];
    }
    //finally add the weight of the original pixel to the color
    color += tex2D(colorMap, texCoord) * kernel[MAX_RADIUS];

    return color;
}

The if(kernel[i] != 0) increases the number of instructions used dramatically!

So my question is this: What increases instruction count? And why would using an if statement increase instruction count by over 400 in a loop that is only 110 instructions long?

EDIT: Above question edited. I mistakenly thought registers were being taken when it was really instructions. However, the question still applies. What would cause 2 for loops (of length 55 each) to increase the instruction count by over 400 with just 1 added if statement within the loop?

Darkhydro
  • 1,992
  • 4
  • 24
  • 43
  • What do you get if you disassemble the compiled HLSL? Exactly how many extra registers is it using? – Andrew Russell Oct 12 '12 at 11:27
  • 1
    @AndrewRussell I don't know how I missed this, but it was really instruction count that was going over, not registers. I have rephrased the question appropriately. – Darkhydro Oct 14 '12 at 04:47
  • What shader model are you using? – Andrew Russell Oct 14 '12 at 06:22
  • Just tried a compile with fxc (using ps_3_0 since under this it won't compile due to register limit) . Without the branch it gives me 165 instructions, with the branch I have 275 (adds 2 instructions per iteration, which makes sense). What do you use to compile (and which flags) ? – mrvux Oct 14 '12 at 15:16
  • @catflier I am using ps_3_0 as well. I have posted the complete pixel shader since it doesn't seem I was giving enough information... When I compile using Visual Studio with default flags for XNA I get over 940 instructions. – Darkhydro Oct 14 '12 at 22:08

2 Answers2

2

fxc will give you an instruction count. But really, you should do this another way. Try a bidirectional filter, with one pass on U and the other on V?

bjorke
  • 3,295
  • 1
  • 16
  • 20
  • sorry for the late response, haven't been on SO in a while. A little rusty on my shader code, but if i remember correctly that's what this shader is doing. The first loop is a horizontal pass, the second is a vertical pass. I've simply implemented a few register-saving methods, and in an effort to improve performance introduced the if statements, which skip over elements that are 0 (which for small filter radii, there are a LOT of 0's). Maybe you could explain to me more what you were thinking? – Darkhydro Mar 26 '13 at 23:08
  • I mean two distinct passes with two different shaders (or the same shader with different parameters), where you render the first result to a texture and then run the second on those results. – bjorke Mar 30 '13 at 17:23
  • "Planar 2-pass texture mapping and warping." Alvy Ray Smith, 1987 – bjorke Mar 30 '13 at 17:29
  • Ok, I see what you mean. That makes sense. Definitely would reduce the number of instructions used in one pass, but doesn't explain why the if statements are making the instruction count explode. This question is less about finding ways to reduce the instruction count and more about understanding HLSL as best I can. – Darkhydro Apr 05 '13 at 21:31
  • IIRC, all GPU compilers will see the "if" inside the loop and decide that they should unroll the loop. So you'll get a lot of inline unrolled if statements. When in doubt, look at the ASM output from fxc... – bjorke Apr 09 '13 at 16:52
2

To count instructions you can use FXC.exe. Here's a quick guide.

FXC.exe is now found in the Windows 8 SDK, which ships with VS2012.

On a 64-bit PC FXC.exe lives in this directory: C:\Program Files (x86)\Windows Kits\8.0\bin\x86\fxc.exe

Usage, you can input an FX file and output the assembly plus headers to a text file, using this command line:

> FXC.exe C:/Shader.fx /T fx_4_0 /Fx C:/Output.txt

or

> FXC.exe C:/Shader.fx /T fx_4_0 /Cc /Ni /Fc C:/Output.html

to get a cool syntax highlighted HTML output

Dr. Andrew Burnett-Thompson
  • 20,980
  • 8
  • 88
  • 178
  • I'll dl the SDK if needed, but I have VS2010, so I was wondering if there's another location I can find FXC? Also, I can get the instruction count from XNA when it goes over. – Darkhydro Mar 26 '13 at 23:18
  • I have it too (Where Dr. ABT said it was) but I have VS2012 as well as VS2010. For me: >fxc "*.fx" /T ps_3_0 /E PixelShaderFunction /Zi /Cc /Ni /No /Fx C:/output.html Worked wonders. Thank you so much for the answer! (ps_3_0 for pixelshader version 3.0 and "/E" to mark the pixelshader's function name) I got: "// approximately 104 instruction slots used (14 texture, 90 arithmetic)" at the end of the file. – Lodewijk Oct 28 '13 at 00:02
  • The exe is here: (although you shouldn't trust a stranger's exe!) https://mega.co.nz/#!7FZm2ajK!YFgqoVtKgc6bkyzsARwTrTIlrflaMzEfvGBB5jxMwLs – Lodewijk Oct 28 '13 at 00:06