Questions tagged [fma]

Fused Multiply Add or Multiply-Accumulate

The Fused Multiply Add (also known as Multiply Accumulate) operation is when a multiplication followed by an addition or subtraction is done in a single operation with only one rounding at the end.

For example:

x = a * b + c

Would normally be done using two roundings without Fused-Multiply Add. (one after a * b and one after a * b + c)

Fused Multiply Add combines the two operations into a single operation thereby increasing accuracy in the computed result.

Supported Architectures include:

  • PowerPC
  • Intel x86 (via FMA3 instruction set)
  • AMD x86 (via FMA4 instruction set)
82 questions
9
votes
1 answer

Difference in gcc -ffp-contract options

I have a question regarding the -ffp-contract flag in GNU GCC (see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html). The flag documentation is written as follows: -ffp-contract=off disables floating-point expression contraction.…
8
votes
2 answers

How do I know if I can compile with FMA instruction sets?

I have seen questions about how to use FMA instructions set but before I get to start using them, I'd first like to know if I can (does my processor support them). I found a post saying that I needed to look at the output of (working on Linux): more…
user18490
  • 3,546
  • 4
  • 33
  • 52
7
votes
3 answers

How is fma() implemented

According to the documentation, there is a fma() function in math.h. That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64…
the swine
  • 10,713
  • 7
  • 58
  • 100
7
votes
3 answers

Will gfortran or ifort compilers wisely use SIMD instructions when summing the product of two arrays?

I've got some code written with numpy, and I'm considering porting it to Fortran for better performance. One operation I do several times is summing the element-wise product of two arrays: sum(A*B) It looks like fused multiply-add instructions…
lnmaurer
  • 1,687
  • 4
  • 22
  • 32
6
votes
1 answer

Can C# make use of fused multiply-add?

Does the C# compiler / jitter make use of fused multiply-add operations if they are available on the hardware being used? If it does, are there any particular compiler settings I need to set in order to take advantage of it?
Paul Chernoch
  • 5,275
  • 3
  • 52
  • 73
5
votes
0 answers

Clang fused multiply-add depends on constancy of expression arguments

As indicated in the answer to clang 14.0.0 floating point optimizations, Clang since version 14 applies fused multiply add (FMA) instructions even for constant computations performed at compile-time. At the same time, one can observe that the result…
Fedor
  • 17,146
  • 13
  • 40
  • 131
5
votes
1 answer

Fastest way to multiply and sum/add two arrays (dot product) - unaligned surprisingly faster than FMA

Hi I have the following code: public unsafe class MultiplyAndAdd : IDisposable { float[] rawFirstData = new float[1024]; float[] rawSecondData = new float[1024]; static int alignment = 32; float[] alignedFirstData = new float[1024 +…
Peter
  • 37,042
  • 39
  • 142
  • 198
5
votes
1 answer

Using FMA instructions for an FFT algorithm

I have a bit of C++ code that has become a somewhat useful FFT library over time, and it has been made to run decently fast using SSE and AVX instructions. Granted, it's all only based on a radix-2 algorithm, but it still holds up. My latest itch to…
Kumputer
  • 588
  • 1
  • 6
  • 22
5
votes
1 answer

Understanding FMA instructions performance

i'm tring to understand how can i max out the number of operations i can get on my CPU. I'm doing a simple matrix multiplication program, and i have a Skylake processor. I was looking at the wikipedia page for the flops information on this…
Peter L.
  • 157
  • 1
  • 1
  • 6
5
votes
2 answers

Why does AVX512-IFMA support only 52-bit ints?

From the value we can infer that it uses the same components as double-precision floating-point hardware. But double has 53 bits of significand, so why is AVX512-IFMA limited to 52 bits? Sure the mantissa has only 52 bits and one bit is hidden, but…
phuclv
  • 37,963
  • 15
  • 156
  • 475
5
votes
1 answer

For XMM/YMM FP operation on Intel Haswell, can FMA be used in place of ADD?

This question is for packed, single-prec floating ops with XMM/YMM registers on Haswell. So according to the awesome, awesome table put together by Agner Fog, I know that MUL can be done on either port p0 and p1 (with recp thruput of 0.5), while…
codechimp
  • 1,509
  • 1
  • 14
  • 21
5
votes
2 answers

Is there any scenario where function fma in libc can be used?

I come across this page and find there is an odd floating multiply add function --fma and fmaf. It says that the result is something like: (x * y) + z #fma(x,y,z) And the value is infinite precision and round once to the result format…
Hongxu Chen
  • 5,240
  • 2
  • 45
  • 85
5
votes
1 answer

fmad=false gives good performance

From Nvidia release notes: The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of floating-point multiplies and add/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA) has been added: …
Sayan
  • 2,662
  • 10
  • 41
  • 56
4
votes
1 answer

Multiply and Add Functions

This question is regarding the mad functions available in OpenCL which promise significant improvements for calculations of the type: a * b + c if used as mad(a,b,c) and compiled with cl-mad-enable. I have tried a calculation of the form a + b * c…
Omar Khan
  • 68
  • 6
4
votes
1 answer

Terminology: why "floating multiply-add" instead of "fused multiply-add"?

C11 (and newer): 7.12.13 Floating multiply-add IEEE 754-2008: fused multiply add, fusedMultiplyAdd Wikipedia: fused multiply-add Why C11 (and newer) uses "floating multiply-add" instead of "fused multiply-add"? Where this "floating" comes…
pmor
  • 5,392
  • 4
  • 17
  • 36