2

pandas.DataFrame.to_markdown transforms large int to float. Is it a bug or a feature? Are there any solutions?

>>> df = pd.DataFrame({"A": [123456, 123456]})
>>> print(df.to_markdown())
|    |      A |
|---:|-------:|
|  0 | 123456 |
|  1 | 123456 |

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> print(df.to_markdown())
|    |           A |
|---:|------------:|
|  0 | 1.23457e+06 |
|  1 | 1.23457e+06 |

>>> print(df)
         A
0  1234567
1  1234567

>>> print(df.A.dtype)
int64
Marc
  • 1,630
  • 1
  • 15
  • 17

2 Answers2

2

I initially found only a workaround, but not the explanation: converting the column to strings.

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> df["A"] = df.A.astype(str)
>>> print(df.to_markdown())
|    |       A |
|---:|--------:|
|  0 | 1234567 |
|  1 | 1234567 |

Update:

I think it is caused by 2 elements:

  • The _column_type function in tabulate:
def _column_type(strings, has_invisible=True, numparse=True):
    """The least generic type all column values are convertible to.

It can be solved by disabling the conversion via tablefmt="pretty":

print(df.to_markdown(tablefmt="pretty"))
+---+---------+
|   |    A    |
+---+---------+
| 0 | 1234567 |
| 1 | 1234567 |
+---+---------+
  • When there are more than one column, and that one of them contains float numbers. Since tabulate uses df.values to extract the data, which transforms the DataFrame to numpy.array, all values are then converted to the same dtype (float). This is also discussed in this issue.
>>> df = pd.DataFrame({"A": [1234567, 1234567], "B": [0.1, 0.2]})
>>> print(df)
         A    B
0  1234567  0.1
1  1234567  0.2

>>> print(df.A.dtype)
int64

>>> print(df.to_markdown(tablefmt="pretty"))
+---+-----------+-----+
|   |     A     |  B  |
+---+-----------+-----+
| 0 | 1234567.0 | 0.1 |
| 1 | 1234567.0 | 0.2 |
+---+-----------+-----+

>>> df.values
array([[1.234567e+06, 1.000000e-01],
       [1.234567e+06, 2.000000e-01]])
Marc
  • 1,630
  • 1
  • 15
  • 17
  • I'll leave this answer unaccepted awhile to see if someone can provide a better answer than my own. – Marc Dec 25 '20 at 22:31
0

If you check the pandas options, the default number of significant digits is 6.

import pandas as pd

pd.describe_option()

display.precision : int
    Floating point output precision (number of significant digits). This is
    only a suggestion
    [default: 6] [currently: 6]
r-beginners
  • 31,170
  • 3
  • 14
  • 32