3

Intel® 64 and IA-32 Architectures Optimization Reference Manual lists latency and throughput figures for various CPU instructions.

For transcendental functions (FSIN etc) some of the figures are listed as ranges (page C-29). Footnote 4 explains:

Latency and Throughput of transcendental instructions can vary substantially in a dynamic execution environment. Only an approximate value or a range of values are given for these instructions.

My question is: what factors affect the throughput and latency of such instructions? I imagine the value of the argument is one factor. Are there any other?

NPE
  • 486,780
  • 108
  • 951
  • 1,012

2 Answers2

4

Besides the argument, the mix of other instructions that are in flight may have an effect on the latency and throughput. These instructions are microcoded, which means they generate a sequence of µops which need to contend with other instructions for ALU resources; in case of such contention, performance may be adversely effected.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • 2
    Beat me to it - I was about to say much the same thing. The only thing I would have added is to highlight that operations like `FSIN` might be implemented by some kind of successive approximation, such as evaluating a short series, which would mean multiple steps requiring internal resources, and thus more chance to 'clash' with other ops. – JasonD Jan 22 '13 at 21:30
2

The x87 control word specifies the accuracy of computations (64-bit, 53-bit, or 24-bit mantissa), and it can affect the performance of transcendental functions, especially those of them which internally use division or square root. In general, I advise to avoid using trigonometric x87 instructions because by design they are very inaccurate for large input values.

Marat Dukhan
  • 11,993
  • 4
  • 27
  • 41