-1

Many questions have been asked on StackOverflow and elsewhere about Python's confusing behaviour with calculations which use floats - often returning a result which is clearly wrong by a small amount. The explanation for this is invariably linked to. A practical simple solution is not usually provided however.

It isn't just the error (which is usually negligible) - it is more the mess and inelegance of getting a result like 3.999999999999999 for a simple sum like 8.7 - 4.7.

I have written a simple solution for this, and my question is, why isn't sthg like this automatically implemented by Python behind the scenes?

The basic concept is to convert all floats into integers, to do the operation, and then convert back appropriately into a float. The difficulties explained in the above-linked doc only apply to floats, not to ints, which is why it works. Here is the code:

def justwork(x,operator,y):
    numx = numy = 0
    if "." in str(x):
        numx = len(str(x)) - str(x).find(".") -1
    if "." in str(y):
        numy = len(str(y)) - str(y).find(".") -1
    num = max(numx,numy)

    factor = 10 ** num
    newx = x * factor
    newy = y * factor

    if operator == "%":
        ans1 = x % y
        ans = (newx % newy) / factor
    elif operator == "*":
        ans1 = x * y
        ans = (newx * newy) / (factor**2)
    elif operator == "-":
        ans1 = x - y
        ans = (newx - newy) / factor
    elif operator == "+":
        ans1 = x + y
        ans = (newx + newy) / factor
    elif operator == "/":
        ans1 = x / y
        ans = (newx / newy)
    elif operator == "//":
        ans1 = x // y
        ans = (newx // newy)

    return (ans, ans1)

This is admittedly rather inelegant and could probably be improved with a bit of thought, but it gets the job done. The function returns a tuple with the correct result (by converting to integer), and the incorrect result (automatically provided). Here are examples of how this provides accurate results, as opposed to doing it normally.

#code                           #returns tuple with (correct, incorrect) result
print(justwork(0.7,"%",0.1))    #(0.0, 0.09999999999999992)
print(justwork(0.7,"*",0.1))    #(0.07, 0.06999999999999999)
print(justwork(0.7,"-",0.2))    #(0.5, 0.49999999999999994)
print(justwork(0.7,"+",0.1))    #(0.8, 0.7999999999999999)
print(justwork(0.7,"/",0.1))    #(7.0, 6.999999999999999)
print(justwork(0.7,"//",0.1))   #(7.0, 6.0)

TLDR: Essentially the question is, Why are floats stored as base 2 binary fractions (which are inherently imprecise) when they could be stored the same way as integers (which Just Work)?

gnoodle
  • 149
  • 9
  • 3
    This can only “work” in a simple domain, notably simple arithmetic with short decimal numerals. It will not work when more complicated computations are involved, such as non-decimal fractions or chains of computations that produce results that are not representable with short decimal numerals. As for why floating-point is used rather than a fixed integer format, it is because the point floats: Floating-point numbers have built in scale that makes them able to handle very large or very small numbers, as occur in physics, for example. This is called dynamic range. – Eric Postpischil Jul 23 '20 at 16:53
  • [Decimal floating point](https://en.wikipedia.org/wiki/Decimal_floating_point) is a thing. On most systems you have better support for binary floating point, though (this is easier to implement efficiently in hardware). – chtz Jul 23 '20 at 16:53
  • 2
    Primary reason is performance. Which is too large a cost to pay for programs that do billions of floating point calculations. And it is definitely not only Python. – Sayandip Dutta Jul 23 '20 at 16:54
  • Looks like python has a module for decimal floating point numbers: https://docs.python.org/3/library/decimal.html – chtz Jul 23 '20 at 16:59
  • @EricPostpischil you are correct it won't work with more complex numbers. It also won't work with exponentiation to a non-integer power. However when it can be impemented (simple arithmetic), it should be – gnoodle Jul 23 '20 at 16:59
  • @SayandipDutta there should at least be an easily-usable secondary set of operators which prioritise accuracy over performance. For example, prefixing the operator with £ (random example) - £* would perform an accurate multiplication - and so on – gnoodle Jul 23 '20 at 17:02
  • The fact that this hasn't been implemented in any widely used general purpose programming language in the past half century suggests there isn't much demand for this feature. As you observe, it isn't difficult to implement for the simple case: I've seen banking software that used precisely this technique when computing transactions, but it was coded at the application level, not in the programming language. – snakecharmerb Jul 23 '20 at 17:16
  • Why do you want floats stored as base 10 decimal fractions (which are inherently imprecise) when they could be stored the same way as integers (which Just Work)? – Kelly Bundy Jul 23 '20 at 17:25
  • If you care for precision there is already a `decimal` module in standard library. To add another literal to account for something that is rarely necessary, and slows down performance in the long run is unnecessary in my opinion, considering it can be implemented, when needed, with minimal effort. – Sayandip Dutta Jul 23 '20 at 17:38
  • `justwork(1, '/', 3)` claims `0.3333333333333333` is the correct result. How is that correct? – Kelly Bundy Jul 23 '20 at 17:56
  • 2
    And if I change it to `return ans` (your claimed "correct" result), then `justwork(justwork(1, '/', 3), '*', 3)` results in `0.9999999999999998`. Whereas with `return ans1` (your claimed "incorrect" result) it results in `1.0`. – Kelly Bundy Jul 23 '20 at 18:04
  • @HeapOverflow obviously 1/3 can't be expressed as a base-10 decimal. So yes, 0.3333333333333333 isn't "correct". But within the context of a recurring decimal, that cannot be said to be incorrect - it is the best expression there is, and every human who sees that will know that it is 0.33 recurring. Whereas the types of mistakes returned by the normal operators are simply laughable and intelligible to most people. – gnoodle Jul 23 '20 at 20:20
  • @HeapOverflow and with (1/3)*3 - returning 0.99 recurring is simply the outcome of the loss of precision caused by the inability to express 1/3 as a decimal. The fact that the standard operators return the correct result I suspect is attributable more to a lucky mistake in that particular example (i.e. the number returned is slightly more than 0.33 recurring, so gets bumped up to 1.0) more than an ingenious solution to that – gnoodle Jul 23 '20 at 20:26
  • @SayandipDutta @ chtz - you are correct, the decimal module is the best solution. Thanks – gnoodle Jul 23 '20 at 22:02
  • @HeapOverflow re my assertion that `(1/3)*3` returning `1` is a lucky mistake, not an intentional feature - this can be illustrated using the Decimal module (which as has been pointed out is the best approach) - `print((Decimal('0.1') / Decimal('0.3')) * Decimal('3.0'))` returns `0.9999999999999999999999999999` – gnoodle Jul 23 '20 at 22:07
  • It sounds like you are mixing `FLOAT` and `DOUBLE`. When numbers are stored in 32-bit FLOATs, but operated on with 64-bit DOUBLE, extra rounding occurs. This can lead to nasties like what you are showing. Could someone explain which encoding Python uses at each stage? – Rick James Jul 24 '20 at 16:40

1 Answers1

0

Three points:

  1. the function in the question/general method proposed, while it does avoid the problem in many cases, there are many other cases, even relatively simple ones, where it has the same problem.
  2. there is a decimal module which always provides accurate answers (even when the justwork() function in the question fails to)
  3. using the decimal module slows things down considerably - taking roughly 100 times longer. The default approach sacrifices accuracy to prioritise speed. [Whether making this the default is the right approach is debatable].

To illustrate these three points consider the following functions, loosely based on that in the question:

def justdoesntwork(x,operator,y):
    numx = numy = 0
    if "." in str(x):
        numx = len(str(x)) - str(x).find(".") -1
    if "." in str(y):
        numy = len(str(y)) - str(y).find(".") -1
    factor = 10 ** max(numx,numy)
    newx = x * factor
    newy = y * factor

    if operator == "+":     myAns = (newx + newy) / factor
    elif operator == "-":   myAns = (newx - newy) / factor
    elif operator == "*":   myAns = (newx * newy) / (factor**2)
    elif operator == "/":   myAns = (newx / newy)
    elif operator == "//":  myAns = (newx //newy)
    elif operator == "%":   myAns = (newx % newy) / factor

    return myAns

and

from decimal import Decimal
def doeswork(x,operator,y):
    if operator == "+":     decAns = Decimal(str(x)) + Decimal(str(y))
    elif operator == "-":   decAns = Decimal(str(x)) - Decimal(str(y))
    elif operator == "*":   decAns = Decimal(str(x)) * Decimal(str(y))
    elif operator == "/":   decAns = Decimal(str(x)) / Decimal(str(y))
    elif operator == "//":  decAns = Decimal(str(x)) //Decimal(str(y))
    elif operator == "%":   decAns = Decimal(str(x)) % Decimal(str(y))

    return decAns

and then looping through many values to find where myAns is different to decAns:

operatorlist = ["+", "-", "*", "/", "//", "%"]
for a in range(1,1000):
    x = a/10
    for b in range(1,1000):
        y=b/10
        counter = 0
        for operator in operatorlist:
            myAns, decAns = justdoesntwork(x, operator, y),  doeswork(x, operator, y)
            if (float(decAns) != myAns)   and     len(str(decAns)) < 5  :
                print(x,"\t", operator, " \t ", y, " \t=   ", decAns,  "\t\t{", myAns, "}")

=> this goes through all values to 1 d.p. from 0.1 to 99.9 - and indeed fails to find any values where myAns is different to decAns.

However if it is changed to give 2d.p. (i.e. either x = a/100 or y = b/100), then many examples appear. For example, 0.1+1.09 - this can easily be checked by typing in the console ((0.1*100)+(1.09*100)) / (100), which uses the basic method of the question, and which returns 1.1900000000000002 instead of 1.19. The source of the error is in 1.09*100 which returns 109.00000000000001. [Simply typing in 0.1+1.09 also gives the same error]. So the approach suggested in the question doesn't always work.

Using Decimal() however returns the correct answer: Decimal('0.1')+Decimal('1.09') returns Decimal('1.19').

[Note: Don't forget to enclose the 0.1 and 1.09 with quotes. If you don't, Decimal(0.1)+Decimal(1.09) returns Decimal('1.190000000000000085487172896') - because it starts with a float 0.1 which is stored inaccurately, and then converts that to Decimal - GIGO. Decimal() has to be fed a string. Taking a float, converting it to a string, and from there to Decimal, does seem to work though, the problem is only when going directly from float to Decimal].


In terms of time cost, run this:

import timeit
operatorlist = ["+", "-", "*", "/", "//", "%"]

for operator in operatorlist:
    for a in range(1,10):
        a=a/10
        for b in range(1,10):
            b=b/10
            
            DECtime  = timeit.timeit("Decimal('" +str(a)+ "') " +operator+ " Decimal('" +str(b)+ "')", setup="from decimal import Decimal")
            NORMtime = timeit.timeit(str(a) +operator+ str(b))
            timeslonger = DECtime // NORMtime
            print("Operation:  ", str(a) +operator +str(b) , "\tNormal operation time: ", NORMtime, "\tDecimal operation time: ", DECtime, "\tSo Decimal operation took ", timeslonger, " times longer")

This shows that Decimal operations consistently take around 100 times longer, for all the operators tested.

[Including exponentiation in the list of operators shows that exponentiation can take 3000 - 5000 times longer. However this is partly because Decimal() evaluates to far greater precision than normal operations - Decimal() default precision is 28 places - Decimal("1.5")**Decimal("1.5") returns 1.837117307087383573647963056, whereas 1.5**1.5 returns 1.8371173070873836. If you limit b to whole numbers by replacing b=b/10 with b=float(b) (which will prevent results with high SFs), the Decimal calculation takes around 100 times longer, as with other operators].


It could still be argued that the time cost is only significant for users performing billions of calculations, and most users would prioritise getting intelligible results over a time difference which is pretty insignificant in most modest applications.

gnoodle
  • 149
  • 9
  • with thanks to @SayandipDutta @ EricPostpischil @ chtz and everyone else for their helpful comments – gnoodle Jul 24 '20 at 09:47