There are standard libraries of shader functions, such as for Cg. But are there resources which tell you how long each takes... I'm thinking similar to how you used to be able to look up how many cycles each ASM op would take.
-
2It varies from device to device – Flexo Dec 07 '11 at 12:24
-
I guess, nowadays more important is how many processors can you load up. If you set a heavy shader with dependent texture reads, you'll get stalls and waits and other unpleasant stuff. Pure "cycle count" won't help in this case. – Lyth Dec 07 '11 at 12:31
-
Both true - but some idea would still be useful as a starting point. For instance the mathematical/geometrical functions... how does division compare against cos, or cos against acos, or sqrt against tan... – Mr. Boy Dec 07 '11 at 12:40
-
@John I guess they are generally more or less the same cycles, but I cannot say how they perform in comparison to simple MULs and ADDs. But what I know is that at least in NVDIA's architectures the multiprocessors have fewer ALUs for transcendental functions (like sin, sqrt, ...) than simple MUL/ADD-ALUs (maybe 8:1 or 4:1), So they can perform more MULs/ADDs in parallel than SINs/SQRTs. Other than that you might have to ask the the individual GPU developers. – Christian Rau Dec 07 '11 at 14:01
1 Answers
There are no reliable resources that will tell you how long various standard shader functions take. Not even for a particular piece of hardware.
The reason for this has to do with instruction scheduling and the way modern shader architectures work. Take a simple sin
function. Let's say that the hardware has a special hardware to compute the sine of a value, so it's not manually using a Tailor series or something. However, let's also say that it takes a sequence of 4 opcodes to actually compute it. Therefore, sin
would take "4 cycles".
However, all of those opcodes are scalar operations. Therefore, while they're going on, you could in fact have some 3-vector dot-products, or in the case of some hardware, 4-vector dot-products going on at the same time, on the same processor. Therefore, if the hardware has 4-vector dot-products with scalar operations, the number of cycles it takes to execute a sin
and a matrix-vector multiply is... still 4.
So how much did the sin
operation cost? If you take out the matrix multiply, nothing gets faster. If you take out the sin
, nothing still gets faster. How much does it cost? You can't say, because the cost of a single operation is irrelevant; the only measurable quantity is the cost of the shader itself.
Ultimately, all you can do is try to build your shader reasonably and see what the performance is. Unless you have low-level debugging tools to deprocess the underlying shader assembly (and no, DX assembly isn't good enough), that's really the best you can do.

- 449,505
- 63
- 781
- 982