Questions tagged [fma]

Fused Multiply Add or Multiply-Accumulate

The Fused Multiply Add (also known as Multiply Accumulate) operation is when a multiplication followed by an addition or subtraction is done in a single operation with only one rounding at the end.

For example:

x = a * b + c

Would normally be done using two roundings without Fused-Multiply Add. (one after a * b and one after a * b + c)

Fused Multiply Add combines the two operations into a single operation thereby increasing accuracy in the computed result.

Supported Architectures include:

  • PowerPC
  • Intel x86 (via FMA3 instruction set)
  • AMD x86 (via FMA4 instruction set)
82 questions
0
votes
1 answer

GCC inclusion of AVX512's "Fused Multiply Add" instructions when compiling for Cascade-Lake processors

According to gcc's documention, compiling with "-march=cascadelake" does not enable the flag -AVX512IFMA (which, if I understand correctly, enables support for AVX512's FMA instructions). In contrast, this flag is included for example when compiling…
Borbei
  • 219
  • 3
  • 7
0
votes
1 answer

How to disable fma3 instructions in gcc

I need to disable FMA3 instructions (for backward compatibility issue) for the 64bit system. I'v used _set_FMA3_enable(0) in my windows environment. And what option (or macro) I need to use to disable FMA3 in gcc? For example. #include…
0
votes
1 answer

How to use fused multiply and add in AVX for 16 bit packed integers

I know there it is possible to do multiply-and-add using a single instruction in AVX2. I want to use multiply-and-add instruction where each 256-bit AVX2 variable is packed with 16, 16-bit variables. For instance, consider the example…
Rick
  • 361
  • 5
  • 17
0
votes
1 answer

Vectorization flags with Eigen and IPOPT

I have some C++ function that I am optimizing with IPOPT. Although the cost function, constraint functions, etc. are written in C++, the code was originally written to use the C-interface. I haven't bothered to change that yet unless it turns out to…
bremen_matt
  • 6,902
  • 7
  • 42
  • 90
0
votes
0 answers

What are the pipeline stages for Fused Multiply Add?

An FMA operation (A*B + C) can be done in 5 cycles on intel's Haswell architecture. Can anyone explain what happens in each of the 5 cycles? For a Multiply I know that the stages are as follows: Seperate Mantissa and Exponent Multiply…
Amir
  • 1
  • 1
0
votes
2 answers

Using AVX with GCC: __builtin_ia32_addpd256 not declared

If I #include I get this error: error: '__builtin_ia32_addpd256' was not declared in this scope I have defined __AVX__ and __FMA__ macros to make AVX avilable, but apparently this isn't enough. There is no error if I use compiler…
Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
-4
votes
1 answer

Can floating point computation be used in any reliable function notably containers and algorithms?

In C and C++ floating point computations are non deterministic by default as not even the true datatype is chosen by user, as for any intermediate computation of a FP subexpression, the compiler can choose to represent a value with higher precision…
curiousguy
  • 8,038
  • 2
  • 40
  • 58
1 2 3 4 5
6