Questions tagged [fma]

Fused Multiply Add or Multiply-Accumulate

The Fused Multiply Add (also known as Multiply Accumulate) operation is when a multiplication followed by an addition or subtraction is done in a single operation with only one rounding at the end.

For example:

x = a * b + c

Would normally be done using two roundings without Fused-Multiply Add. (one after a * b and one after a * b + c)

Fused Multiply Add combines the two operations into a single operation thereby increasing accuracy in the computed result.

Supported Architectures include:

PowerPC
Intel x86 (via FMA3 instruction set)
AMD x86 (via FMA4 instruction set)

82 questions

votes

1 answer

Difference in gcc -ffp-contract options

I have a question regarding the -ffp-contract flag in GNU GCC (see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html). The flag documentation is written as follows: -ffp-contract=off disables floating-point expression contraction.…

asked Apr 11 '17 at 17:24

Stefan Moosbrugger

votes

2 answers

How do I know if I can compile with FMA instruction sets?

I have seen questions about how to use FMA instructions set but before I get to start using them, I'd first like to know if I can (does my processor support them). I found a post saying that I needed to look at the output of (working on Linux): more…

linux x86 intel processor fma

asked May 02 '13 at 22:39

user18490

3,546
4
33
52

votes

3 answers

How is fma() implemented

According to the documentation, there is a fma() function in math.h. That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64…

floating-point ieee-754 instruction-set fma

asked Feb 20 '15 at 14:05

the swine

10,713
7
58
100

votes

3 answers

Will gfortran or ifort compilers wisely use SIMD instructions when summing the product of two arrays?

I've got some code written with numpy, and I'm considering porting it to Fortran for better performance. One operation I do several times is summing the element-wise product of two arrays: sum(A*B) It looks like fused multiply-add instructions…

fortran gfortran simd intel-fortran fma

asked Jan 10 '14 at 17:43

lnmaurer

1,687
4
22
32

votes

1 answer

Can C# make use of fused multiply-add?

Does the C# compiler / jitter make use of fused multiply-add operations if they are available on the hardware being used? If it does, are there any particular compiler settings I need to set in order to take advantage of it?

c# fma system.numerics

asked May 25 '16 at 17:06

Paul Chernoch

5,275
3
52
73

votes

0 answers

Clang fused multiply-add depends on constancy of expression arguments

As indicated in the answer to clang 14.0.0 floating point optimizations, Clang since version 14 applies fused multiply add (FMA) instructions even for constant computations performed at compile-time. At the same time, one can observe that the result…

c optimization floating-point clang fma

asked Mar 27 '23 at 18:18

Fedor

17,146
13
40
131

votes

1 answer

Fastest way to multiply and sum/add two arrays (dot product) - unaligned surprisingly faster than FMA

Hi I have the following code: public unsafe class MultiplyAndAdd : IDisposable { float[] rawFirstData = new float[1024]; float[] rawSecondData = new float[1024]; static int alignment = 32; float[] alignedFirstData = new float[1024 +…

c# .net-6.0 intrinsics avx2 fma

asked Mar 26 '22 at 10:17

Peter

37,042
39
142
198

votes

1 answer

Using FMA instructions for an FFT algorithm

I have a bit of C++ code that has become a somewhat useful FFT library over time, and it has been made to run decently fast using SSE and AVX instructions. Granted, it's all only based on a radix-2 algorithm, but it still holds up. My latest itch to…

c++ signal-processing fft fma

asked Mar 26 '20 at 07:19

Kumputer

votes

1 answer

Understanding FMA instructions performance

i'm tring to understand how can i max out the number of operations i can get on my CPU. I'm doing a simple matrix multiplication program, and i have a Skylake processor. I was looking at the wikipedia page for the flops information on this…

floating-point cpu-architecture instruction-set flops fma

asked Jan 07 '17 at 23:53

Peter L.

votes

2 answers

Why does AVX512-IFMA support only 52-bit ints?

From the value we can infer that it uses the same components as double-precision floating-point hardware. But double has 53 bits of significand, so why is AVX512-IFMA limited to 52 bits? Sure the mantissa has only 52 bits and one bit is hidden, but…

x86 precision avx512 alu fma

asked Mar 04 '15 at 18:23

phuclv

37,963
15
156
475

votes

1 answer

For XMM/YMM FP operation on Intel Haswell, can FMA be used in place of ADD?

This question is for packed, single-prec floating ops with XMM/YMM registers on Haswell. So according to the awesome, awesome table put together by Agner Fog, I know that MUL can be done on either port p0 and p1 (with recp thruput of 0.5), while…

sse avx throughput flops fma

asked Mar 04 '15 at 17:52

codechimp

1,509
1
14
21

votes

2 answers

Is there any scenario where function fma in libc can be used?

I come across this page and find there is an odd floating multiply add function --fma and fmaf. It says that the result is something like: (x * y) + z #fma(x,y,z) And the value is infinite precision and round once to the result format…

c floating-point posix libc fma

asked Nov 08 '12 at 15:24

Hongxu Chen

5,240
2
45
85

votes

1 answer

fmad=false gives good performance

From Nvidia release notes: The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of floating-point multiplies and add/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA) has been added: …

cuda nvidia fma

asked Aug 17 '12 at 19:03

Sayan

2,662
10
41
56

votes

1 answer

Multiply and Add Functions

This question is regarding the mad functions available in OpenCL which promise significant improvements for calculations of the type: a * b + c if used as mad(a,b,c) and compiled with cl-mad-enable. I have tried a calculation of the form a + b * c…

optimization gpu opencl fma

asked Feb 24 '12 at 00:43

Omar Khan

votes

1 answer

Terminology: why "floating multiply-add" instead of "fused multiply-add"?

C11 (and newer): 7.12.13 Floating multiply-add IEEE 754-2008: fused multiply add, fusedMultiplyAdd Wikipedia: fused multiply-add Why C11 (and newer) uses "floating multiply-add" instead of "fused multiply-add"? Where this "floating" comes…

c floating-point language-lawyer terminology fma

asked Feb 02 '22 at 12:26

pmor

5,392
4
17
36

Prev 1

3 4 5 6 Next