Questions tagged [fast-math]

The `-ffast-math` (or a similarly-named) compiler option trading precision and adherence to the IEEE 754 floating point standard in favor of execution speed

Most compilers have an option for turning floating-point-related optimizations which sacrifice computational precision and/or adherence to the costlier corner-cases of the IEEE 754 floating-point standard - in favor of better execution speed.

  • For gcc and clang, this option is named -ffast-math (and there are sub-options)
  • For nvcc, the name is --use-fast-math
  • For OpenCL compilation, the name is -cl-fast-relaxed-math

For more information: What does gcc's ffast-math actually do?

44 questions
7
votes
1 answer

Do denormal flags like Denormals-Are-Zero (DAZ) affect comparisons for equality?

If I have 2 denormal floating point numbers with different bit patterns and compare them for equality, can the result be affected by the Denormals-Are-Zero flag, the Flush-to-Zero flag, or other flags on commonly used processors? Or do these flags…
Zachary Burns
  • 453
  • 2
  • 9
6
votes
2 answers

Dynamic -ffast-math

Is it possible to selectively turn -ffast-math on/off during runtime? For example, creating classes FastMath and AccurateMath with the common base class Math, so that one would be able to use both implementations during runtime? Ditto for flashing…
Michael
  • 5,775
  • 2
  • 34
  • 53
5
votes
0 answers

How can I find math operations that would be optimized by `-ffast-math`?

The -ffast-math C++ compiler option allows the compiler to perform more math optimizations that may slightly change behavior. For example, x * 10 / 10 should cancel but due to the possibility of overflow it would slightly change behavior, and x /…
Aaron Franke
  • 3,268
  • 4
  • 31
  • 51
5
votes
2 answers

When should I use gcc's -Ofast optimization level?

In xcode 5, the Optimization level introduce a new level named -Ofast (Fastest,Aggressive Optimizations). When and how should I use this level?
user49354
  • 199
  • 1
  • 8
5
votes
1 answer

Does -use-fast-math option translate SP multiplications to intrinsics?

I had a quick glance of the CUDA Programming guide w.r.t -use-fast-math optimizations, and although appendix C mention divisions to be converted to an intrinsic but there are no mention of multiplications. The reason I ask this question is, my…
Sayan
  • 2,662
  • 10
  • 41
  • 56
4
votes
0 answers

Flag to generate 'deterministic' floating point operations wrt. pointer alignment with 'fast-math'?

The -ffast-math option in gcc allows the compiler to reorder floating-point operations to have a faster execution. This may lead to slight differences between the results of these operations depending on the alignment of pointers. For instance, on…
Marc
  • 746
  • 4
  • 14
4
votes
1 answer

What is GCC/Clang equivalent of -fp-model fast=1 in ICC

As I read on Intel's website: Intel compiler uses /fp-model fast=1 as defaults. This optimization favors speed over standards compliance. You may use compiler option -mieee-fp to get compliant code. My understanding of the fp-model option in…
marcin
  • 3,351
  • 1
  • 29
  • 33
3
votes
1 answer

Fortran code compiled in one Windows machine (2018 Intel processor) gives different results when exe is copied to other machine (2022 Intel processor)

Is it possible that a Fortran code compiled in one Windows machine with Visual studio 2019 on a 2018 Intel processor gives a slightly different result when the exe is copied to another machine (that has a 2022 Intel processor)? Could you please list…
Millemila
  • 1,612
  • 4
  • 24
  • 45
3
votes
1 answer

Why is std::inner_product slower than the naive implementation?

This is my naive implementation of dot product: float simple_dot(int N, float *A, float *B) { float dot = 0; for(int i = 0; i < N; ++i) { dot += A[i] * B[i]; } return dot; } And this is using the C++ library: float…
ijklr
  • 145
  • 8
3
votes
1 answer

OpenCL Fast Relaxed Math

What does the OpenCL compiler option -cl-fast-relaxed-math do? From reading the documentation - it looks like -cl-fast-relaxed-math allows a kernel to do floating point math on any variables - even if those variables point to the wrong data type,…
benshope
  • 2,936
  • 4
  • 27
  • 39
2
votes
1 answer

Why can't the Rust compiler auto-vectorize this FP dot product implementation?

Lets consider a simple reduction, such as a dot product: pub fn add(a:&[f32], b:&[f32]) -> f32 { a.iter().zip(b.iter()).fold(0.0, |c,(x,y)| c+x*y)) } Using rustc 1.68 with -C opt-level=3 -C target-feature=+avx2,+fma I get .LBB0_5: …
Unlikus
  • 1,419
  • 10
  • 24
2
votes
2 answers

Mingw32 std::isnan with -ffast-math

I am compiling the following code with the -ffast-math option: #include #include #include int main() { std::cout << std::isnan(std::numeric_limits::quiet_NaN() ) << std::endl; } I am getting 0 as output. How…
André Puel
  • 8,741
  • 9
  • 52
  • 83
2
votes
0 answers

Why does -fno-signed-zeros have an effect on vectorization for minimum search?

See this simple minimum search (Godbolt): float foo(const float *data, int n) { float v = data[0]; for (int i = 1; i < n; i++) { float d = data[i]; if (d < v) { v = d; } } return v; } Neither gcc…
geza
  • 28,403
  • 6
  • 61
  • 135
2
votes
2 answers

Where is the source of imprecise calculation in the assembler code of gcc -Ofast compared with -O3?

The following 3 lines give imprecise results with "gcc -Ofast -march=skylake": int32_t i = -5; const double sqr_N_min_1 = (double)i * i; 1. - ((double)i * i) / sqr_N_min_1 Obviously, sqr_N_min_1 gets 25., and in the 3rd line (-5 * -5) / 25 should…
Hartmut Pfitzinger
  • 2,304
  • 3
  • 28
  • 48
2
votes
1 answer

What does the "denormal input" exactly mean in assembly when we consider using DAZ flag for SSE Floating Points

I've read This article and do-denormal-flags-like-denormals-are-zero-daz-affect-comparisons-for-equality and I understand the usage and difference between FTZ and DAZ flags. DAZ applies on input, FTZ on output from an FP operation. What confused me…
lionel
  • 415
  • 1
  • 5
  • 14