Questions tagged [fma]

Fused Multiply Add or Multiply-Accumulate

The Fused Multiply Add (also known as Multiply Accumulate) operation is when a multiplication followed by an addition or subtraction is done in a single operation with only one rounding at the end.

For example:

x = a * b + c

Would normally be done using two roundings without Fused-Multiply Add. (one after a * b and one after a * b + c)

Fused Multiply Add combines the two operations into a single operation thereby increasing accuracy in the computed result.

Supported Architectures include:

PowerPC
Intel x86 (via FMA3 instruction set)
AMD x86 (via FMA4 instruction set)

82 questions

votes

1 answer

GCC inclusion of AVX512's "Fused Multiply Add" instructions when compiling for Cascade-Lake processors

According to gcc's documention, compiling with "-march=cascadelake" does not enable the flag -AVX512IFMA (which, if I understand correctly, enables support for AVX512's FMA instructions). In contrast, this flag is included for example when compiling…

asked Dec 19 '20 at 13:50

Borbei

votes

1 answer

How to disable fma3 instructions in gcc

I need to disable FMA3 instructions (for backward compatibility issue) for the 64bit system. I'v used _set_FMA3_enable(0) in my windows environment. And what option (or macro) I need to use to disable FMA3 in gcc? For example. #include…

c++ gcc cross-platform fma

asked Oct 05 '20 at 11:12

Jacob Jacob

votes

1 answer

How to use fused multiply and add in AVX for 16 bit packed integers

I know there it is possible to do multiply-and-add using a single instruction in AVX2. I want to use multiply-and-add instruction where each 256-bit AVX2 variable is packed with 16, 16-bit variables. For instance, consider the example…

c performance intel avx2 fma

asked Jul 31 '19 at 09:11

Rick

votes

1 answer

Vectorization flags with Eigen and IPOPT

I have some C++ function that I am optimizing with IPOPT. Although the cost function, constraint functions, etc. are written in C++, the code was originally written to use the C-interface. I haven't bothered to change that yet unless it turns out to…

eigen avx eigen3 ipopt fma

asked Mar 23 '18 at 09:45

bremen_matt

6,902
7
42
90

votes

0 answers

What are the pipeline stages for Fused Multiply Add?

An FMA operation (A*B + C) can be done in 5 cycles on intel's Haswell architecture. Can anyone explain what happens in each of the 5 cycles? For a Multiply I know that the stages are as follows: Seperate Mantissa and Exponent Multiply…

architecture intel simd hpc fma

asked May 01 '17 at 18:06

Amir

votes

2 answers

Using AVX with GCC: __builtin_ia32_addpd256 not declared

If I #include I get this error: error: '__builtin_ia32_addpd256' was not declared in this scope I have defined __AVX__ and __FMA__ macros to make AVX avilable, but apparently this isn't enough. There is no error if I use compiler…

c++ gcc avx fma

asked Sep 18 '13 at 08:30

Violet Giraffe

32,368
48
194
335

-4

votes

1 answer

Can floating point computation be used in any reliable function notably containers and algorithms?

In C and C++ floating point computations are non deterministic by default as not even the true datatype is chosen by user, as for any intermediate computation of a FP subexpression, the compiler can choose to represent a value with higher precision…

c++ floating-point precision predicate fma

asked May 29 '19 at 03:30

curiousguy

8,038
2
40
58

Prev 1 2 3 4 5