12

I have a dictionary that I want to write to a csv file, but the floats in the dictionary are rounded off when I write them to the file. I want to keep the maximum precision.

Where does the rounding occur and how can I prevent it?

What I did

I followed the DictWriter example here and I'm running Python 2.6.1 on Mac (10.6 - Snow Leopard).


# my import statements
import sys
import csv

Here is what my dictionary (d) contains:

>>> d = runtime.__dict__
>>> d
{'time_final': 1323494016.8556759,
'time_init': 1323493818.0042379,
'time_lapsed': 198.85143804550171}

The values are indeed floats:

>>> type(runtime.time_init)
<type 'float'>

Then I setup my writer and write the header and values:

f = open(log_filename,'w')
fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(f, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)
f.close()

But when I look in the output file, I get rounded numbers (i.e., floats):

time_init,time_final,time_lapsed
1323493818.0,1323494016.86,198.851438046

< EOF >

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
aDroid
  • 1,008
  • 1
  • 10
  • 12

3 Answers3

7

It looks like csv is using float.__str__ rather than float.__repr__:

>>> print repr(1323494016.855676)
1323494016.855676
>>> print str(1323494016.855676)
1323494016.86

Looking at the csv source, this appears to be a hardwired behavior. A workaround is to cast all of the float values to their repr before csv gets to it. Use something like: d = dict((k, repr(v)) for k, v in d.items()).

Here's a worked-out example:

import sys, csv

d = {'time_final': 1323494016.8556759,
     'time_init': 1323493818.0042379,
     'time_lapsed': 198.85143804550171
}

d = dict((k, repr(v)) for k, v in d.items())

fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(sys.stdout, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)

This code produces the following output:

time_init,time_final,time_lapsed
1323493818.0042379,1323494016.8556759,198.85143804550171

A more refined approach will take care to only make replacements for floats:

d = dict((k, (repr(v) if isinstance(v, float) else str(v))) for k, v in d.items())

Note, I've just fixed this issue for Py2.7.3, so it shouldn't be a problem in the future. See http://hg.python.org/cpython/rev/bf7329190ca6

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • Awesome, works perfectly! Also, thanks for the link to the source. I'm still learning how to navigate the Python docs, a little awkward to me. I added a datetime to the dictionary as well and it gets written as "datetime.date(2011, 12, 10)" which is expected from what you provided. I'll just throw the date in the filename and get it that way. Awesome job! +1 – aDroid Dec 10 '11 at 18:16
  • -1 Awesome sledgehammer, works imperfectly: "Fixes" floats, wrecks datetimes. – John Machin Dec 10 '11 at 19:05
  • True, but I didn't specify datetimes in the original problem so it wasn't something to consider for the original answer. – aDroid Dec 11 '11 at 00:32
  • 2
    Awesome. I don't know how often questions here contribute directly to the source, but for my first question I'm glad I posted it! Python's been growing on me over the past few weeks I've been working with it, and now that my changes (i.e., the changes you made on my behalf) have been incorporated into the source I can now say I've been fully assimilated by Python. :) Thanks again. – aDroid Dec 12 '11 at 18:35
2

It's a known bug^H^H^Hfeature. According to the docs:

"""... the value None is written as the empty string. [snip] All other non-string data are stringified with str() before being written."""

Don't rely on the default conversions. Use repr() for floats. unicode objects need special handling; see the manual. Check whether the consumer of the file will accept the default format of datetime.x objects for x in (datetime, date, time, timedelta).

Update:

For float objects, "%f" % value is not a good substitute for repr(value). The criterion is whether the consumer of the file can reproduce the original float object. repr(value) guarantees this. "%f" % value doesn't.

# Python 2.6.6
>>> nums = [1323494016.855676, 1323493818.004238, 198.8514380455017, 1.0 / 3]
>>> for v in nums:
...     rv = repr(v)
...     fv = "%f" % v
...     sv = str(v)
...     print rv, float(rv) == v, fv, float(fv) == v, sv, float(sv) == v
...
1323494016.8556759 True 1323494016.855676 True 1323494016.86 False
1323493818.0042379 True 1323493818.004238 True 1323493818.0 False
198.85143804550171 True 198.851438 False 198.851438046 False
0.33333333333333331 True 0.333333 False 0.333333333333 False

Notice that in the above, it appears by inspection of the strings produced that none of the %f cases worked. Before 2.7, Python's repr always used 17 significant decimal digits. In 2.7, this was changed to using the minimum number of digits that still guaranteed float(repr(v)) == v. The difference is not a rounding error.

# Python 2.7 output
1323494016.855676 True 1323494016.855676 True 1323494016.86 False
1323493818.004238 True 1323493818.004238 True 1323493818.0 False
198.8514380455017 True 198.851438 False 198.851438046 False
0.3333333333333333 True 0.333333 False 0.333333333333 False

Note the improved repr() results in the first column above.

Update 2 in response to comment """And thanks for the info on Python 2.7. Unfortunately, I'm limited to 2.6.2 (running on the destination machine which can't be upgraded). But I'll keep this in mind for future scripts. """

It doesn't matter. float('0.3333333333333333') == float('0.33333333333333331') produces True on all versions of Python. This means that you could write your file on 2.7 and it would read the same on 2.6, or vice versa. There is no change in the accuracy of what repr(a_float_object) produces.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Thanks for pointing out why this is happening. I might have seen the "stringified with str()" but my n00bness with Python didn't raise a flag w.r.t. str(). – aDroid Dec 10 '11 at 18:23
  • The OP stated that he is new to Python. Working code that fixes his problem is what is needed rather than a cavalier, academic answer. – Raymond Hettinger Dec 10 '11 at 20:21
  • The code in the OP's question shows that "new to Python" is modest; he seemed to be capable of producing code that would iterate over a dict and update its values without handholding. – John Machin Dec 10 '11 at 20:56
  • Though I was only able to do so after hours of searching and tweaking, but eventually got it. That is until I ran into the original problem, which 4 hours of hair pulling didn't solve. Both of your answers were helpful for the different points you each made. And thanks for the info on Python 2.7. Unfortunately, I'm limited to 2.6.2 (running on the destination machine which can't be upgraded). But I'll keep this in mind for future scripts. – aDroid Dec 11 '11 at 00:48
1

This works but it is probably not the best/most efficient way:

>>> f = StringIO()
>>> w = csv.DictWriter(f,fieldnames=headers)
>>> w.writerow(dict((k,"%f"%d[k]) for k in d.keys()))
>>> f.getvalue()
'1323493818.004238,1323494016.855676,198.851438\r\n'
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Looks like your floats are rounding too, unless that's an artifact of getvalue(). I'll look into it. – aDroid Dec 10 '11 at 18:23
  • Nothing to do with getvalue. `%f` formatting uses only 6 decimal places in some cases. "Looks like" is deceptive; see my updated answer. – John Machin Dec 10 '11 at 20:38