3

I am writing a glsl fragment shader in which I use shadow mapping. Following this tutorial http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-16-shadow-mapping/ , I wrote this line to evaluate the shaodw bias to avoid the shadow acne

float bias = 0.005 * tan( acos ( N_L_dot ) );

But I know from math that

tan ( acos ( x ) = sqrt ( 1 - x^2 ) / x

Would it be faster to use such identity instead of tan and acos? In practice, to use this line of code

float bias = 0.005 * sqrt ( 1.f - N_L_dot * N_L_dot   ) / N_L_dot ;

I think my question is something like "Is the gpu faster at doing sqrt and divisions or tan and acos?" ...or am I missing something?

darius
  • 837
  • 9
  • 22
  • 1
    Well, have you tried benchmarking it on the hardware you care about? Odds are the answer varies significantly (from "#1 is faster" over "about equally fast" to "#2 is faster") depending on the hardware, the rest of the code (modern GPUs are apparently quite good at staying busy while waiting for memory), etc. –  Jul 13 '13 at 14:17
  • @delnan Well, no, I wanted to know if there was an answer independent of the hardware, because I can't test on many different gpus and I'd like my software to run fast on any modern hi-end gpu. – darius Jul 13 '13 at 14:29
  • 2
    You might actually be able to remove the sqrt I think (my math is a little rusty). By squaring the bias function and squaring whatever you compare it to. IF you manage to do this, the new function will be faster. (Not sure what the impact is on float precision and such) – Full Frontal Nudity Jul 14 '13 at 10:11
  • @FullFrontalNudity Ha! Thanks there! I think that's a good suggestion! And if I square everything before the comparison it shouldn't affect float precision. – darius Jul 14 '13 at 14:28
  • So can you report back on your findings? – Full Frontal Nudity Jul 15 '13 at 22:24
  • @FullFrontalNudity I don't have good results for now. I did some testing in QuickShader, and there wasn't any difference. But there wasn't any difference even if I removed the whole bias thing... So i did add a for loop around the bias line, but didn't notice any performance difference. I think that's not a good test by the way. I'll do some real testing in a game engine, but now I can't. – darius Jul 19 '13 at 14:23

1 Answers1

8

Using AMD GPU Shader Analyzer it showed that float bias = 0.005 * sqrt ( 1.f - N_L_dot * N_L_dot ) / N_L_dot ; Will generate fewer clock cycle instructions in the shader assembly ( 4 instructions estimating 4 clock cycles).

Where the float bias = 0.005 * tan( acos ( N_L_dot ) ); generated 15 instructions estimating 8 clock cycles to complete.

I ran the two different methods against the Radeon HD 6450 Assembly code. But the results seemed to track well for the different Radeon HD cards.

Looks like the sqrt method will generally perform better.

Aaron Hagan
  • 586
  • 2
  • 4