5

The result of C# floating-point code can lead to different results.

This question is not about why 0.1 + 0.2 != 0.3 and the inherent imprecision of floating point machine numbers.

It is rather linked to the fact that the same C# code, with the same target architecture (x64 for instance), may lead to different results depending on the actual machine / processor that is used.

This question is directly related to this one: Is floating-point math consistent in C#? Can it be?, in which the C# problem is discussed.

For reference, this paragraph in the C# specs is explicit about that risk :

Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects

Indeed we actually experienced an ~1e-14 order of magnitude difference in an algorithm using only double, and we are afraid that this discrepancy will propagate for other iterative algorithm that use this result, and so forth, making our results not consistently reproducible for different quality / legal requirement we have in our field (medical imaging research).

C# and F# share the same IL and common runtime, however, as far as I understand, it may be more something driven by the compiler, which is different for F# and C#.

I feel not savvy enough to understand if the root of the issue is common to both, or if there is hope for F#, should we take the leap into F# to help us solve this.

TL;DR

This inconsistency problem is explicitly described in the C# language specs. We have not found the equivalent in F# specs (but we may not have searched at the right place).

Is there more consistency in F# in this regard?

i.e. If we switch to F#, are we guaranteed to get more consistent results in floating-point calculations across architectures?

Pac0
  • 21,465
  • 8
  • 65
  • 74
  • 2
    I'm guessing this depends on a stage further along than the language and compiler to IL used. I'd be very surprised if the rules applied to F# and C# were diferent. – InBetween May 20 '20 at 14:27
  • If you're calculations are that important why not use `decimal` instead of a floating point type? – juharr May 20 '20 at 14:33
  • @juharr `decimal` type has very poor performance for calculation-intensive algorithms (it's mentioned in the linked post as well), but of course we may need to actually measure this performance hit. – Pac0 May 20 '20 at 14:35
  • 2
    As you can see [here](https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/basic-types) the basic F# primitives `decimal`, `float32`, `single`, `float` and `double` all map to regular .NET floating point types, which mean they will have the exact same limitations and performance characteristics as in C#. If you're using other types than that then I have no idea. – Lasse V. Karlsen May 20 '20 at 14:41
  • 3
    The basic IL instructions are also the same, and while the F# compiler might optimize and generate code sequences slightly different from C#, there should be no big differences. One *regular* issue that many see is that if code can be optimized in such a way that it can run entirely on the cpu, no memory stores/loads, then it can lead to better temporary precision because the cpu typically has registers with higher precision than the types used to store in memory. But F# maps to IL which is JITted by the same engine as C# does, so there should be no big differences. – Lasse V. Karlsen May 20 '20 at 14:43
  • So here's the short answer: Yes, F# suffers from the same caveats. – Lasse V. Karlsen May 20 '20 at 14:43
  • @LasseV.Karlsen about the first comment : indeed the types are the same, but the point was on the intermediary calculations internally used, that may simply be "rounded down to the correct type" at the end. – Pac0 May 20 '20 at 15:38
  • @LasseV.Karlsen about second comment : that would make a suitable potential answer, IMO. – Pac0 May 20 '20 at 15:39
  • 2
    @juharr: The question mentions medical imaging applications. These will involve a variety of mathematical operations, not the restricted operations of, say, working with money doing only addition, subtraction, and simple multiplication. No numerical format, binary or decimal, fixed-point or floating-point, can avoid rounding errors general. 1/3 is not representable in decimal. Imaging will use sines, cosines, and other operations with results that are not representable in decimal formats. Using decimal does not fit arithmetic. – Eric Postpischil May 20 '20 at 16:19
  • 1
    @EricPostpischil Even financial programs have a lot of non trivial operations like the 12th root if you need to convert yearly interest rate to monthly. – curiousguy May 20 '20 at 16:45
  • @curiousguy: That is why I listed specific operations the example was restricted to. – Eric Postpischil May 20 '20 at 16:48
  • 1
    Thanks for your comments - the real issue is the impredictability versus the reproducibility of the results, for instance if it goes through a scientific publication or patent. The fact that we must deal with approximations for any numerical calculations is ok. We just want that the binary executable to reliably give the same results when given some input. – Pac0 May 20 '20 at 17:40
  • 1
    When people used pen and paper to do calculations, they never had a problem figuring out to what number of decimal places calculations needed to be. Clearly there's a problem with the requirements in this case, and not with the machines. Fix what needs fixing, rather than waste time on this nonsense. – Bent Tranberg May 20 '20 at 18:48
  • 2
    @BentTranberg Please elaborate. What is in need of a fix here? How is the runtime architecture and the compiler not guilty of inserting non determinism? – curiousguy May 20 '20 at 21:43
  • 1
    Never trust floating point to be deterministic. Historically there's been all sorts of things that influence. Notably effects from parallel processing, varying floating point computational devices, and varying compilers. The resulting deviations are far below the threshold needed for calculations, so that's not a problem. But if code is written that assumes floats should be exactly the same down to the last bit in all circumstances, then there's a problem. This excludes a lot of opportunities for computing power - so we don't feel obliged to design computers, libraries and compilers like that. – Bent Tranberg May 21 '20 at 13:55
  • @BentTranberg But then how do you write any sort of predicate, like for sorting, for ordered sets, for hashed sets, etc. based on floats? – curiousguy May 22 '20 at 23:45
  • 1
    If you need to compare two floats, and they only differ in the umpteenth decimal, then why would you bother which one comes out on top? If you must take this into consideration, then treat the numbers from the computer as measurements rather than exact calculations. Then do the math to figure out whether rounding or truncation can reliably be used to compare or group, or whatever the need be. Btw, I believe that is the answer to the problem posed in the question - find out whether the discrepancy is actually significant or not. If it is, then I would reconsider the calculations being done. – Bent Tranberg May 23 '20 at 05:01
  • @BentTranberg So, if I understand correctly, the argument here is that my team and I have an X/Y problem if considering the option to use F# to help with this issue. Even though I'd have preferred a more specific answer for F#, this insight It is definitely useful, you should post these comments or something like it is as an answer, IMHO, I'll definitely upvote it (and maybe accept later if there is no more specific one). – Pac0 May 23 '20 at 09:21

1 Answers1

3

In short; C# and F# shares the same run-time and therefore does floating point number computations in the same way so you will see the same behavior in F# as in C# when it comes to floating point number computations.

The issue of 0.1 + 0.2 != 0.3 spans most languages as it comes from the IEEE standard of binary floating pointing numbers, where double is an example. In a binary floating point number 0.1, 0.2 and so on can't be exactly represented. This is one the reason some languages support hex float literals like 0x1.2p3 which can be exactly represented as a binary floating point number (0x1.2p3 is equal to 9 btw in a decimal number system).

Lots of software that relies on double internally like Microsoft Excel and Google Sheet employ various cheats to make the numbers look nice but often isn't numerically sound (I am no expert I just read a bit of Kahan).

In .NET and many other languages there is often a decimal data type that is a decimal floating point numbers ensuring 0.1 + 0.2 = 0.3 is true. However, it doesn't guarantee that 1/3 + 1/3 = 2/3 as 1/3 can't be represented exactly in a decimal number system. As there is no hardware for support for decimal they tend to be slower, in addition the .NET decimal is not IEEE compliant which may or may not be a problem.

If you have fractions and you have lots of clock cycles available you can implement a "big rational" using BigInteger in F#. However, the fractions quickly grows very large and it can't represents 12th roots as mentioned in the comment as outcomes of roots arecommonly irrational (ie can't be represented as rational numbers).

I suppose you could preserve the whole computation symbolically and try to preserve exact values for as long as possible and then very carefully compute a final number. Probably quite hard to do correct and most likely very slow.

I've read a bit of Kahan (he co-designed 8087 and the IEEE standard for floating point numbers) and according to one of the papers I read a pragmatic approach to detect rounding errors due to floating point number is to compute thrice.

One time with normal rounding rules, then with always round down and finally with always round up. If the numbers are reasonably close at the end the computation is likely sound.

According to Kahan cute ideas like "coffins" (for each floating point operation produce a range instead of single value giving the min/max value) just don't work as they are overly pessimistic and you end up with ranges that are infintely large. That certainly match my experience from the C++ boost library that does this and it's also very slow.

So when I worked with ERP software in the past I have from what I read of Kahan recommended that we should use decimals to eliminate "stupid" errors from like 0.1 + 0.2 != 0.3 but realize that there are still other sources for errors but eliminating them is beyond us in compute, storage and competence level.

Hope this helps

PS. This is a complex topic, I once had a regression error when I changed the framework at some point. I dug into it and I found the error came from that in the old framework the jitter used the old-style x86 FPU instructions and in the new jitter it relied on the SSE/AVX instructions. There are many benefits by switching to SSE/AVX but one thing that was lost that the old style FPU instructions internally used 80 bits floats and only when the floating point numbers left the FPU they were rounded to 64 bits while SSE/AVX uses 64 bits internally so that meant the results differed between frameworks.

  • Thanks for your answer. I am perfectly fine with the rounding "problems" of floating point number, as long as it is reproducible (same input, same output). The problem is that same binary and same input doesn't give same output on my machine and my colleagues'. (same OS, but not same internal CPU architecture probably). I think the last paragraph actually is on topic. – Pac0 May 21 '20 at 08:32
  • Do you have a share:able code sample that causes the issues – Just another metaprogrammer May 21 '20 at 08:38
  • Unfortunately, not now, it's proprietary code, and I'd need to take some experiments with the help of some colleagues (to see discrepency on their machine). I can tell that these calculations involve a few loops of ~tens or hundreds iterations on a `double[255]`, and that they use the 4 basic operations and some Log10 and exp10. – Pac0 May 23 '20 at 09:25
  • I think it's unlikely that [FPU in CPU is to blame](https://stackoverflow.com/questions/13102167/do-fp-operations-give-exactly-the-same-result-on-various-x86-cpus), nor GPU if that's used, or OS, but check. I'd also check 32 vs 64 bit (I noticed), .NET version, VS version, C# and compiler version, any library version, any SDK, hyperthreading, BIOS settings, Release without optimization, whether parallelization could affect computation order. Is the same exe copied across, or is it just the source that is compiled and run locally? – Bent Tranberg May 23 '20 at 18:15