3

Take this number as an example:

1.64847910404205

If I create a Pandas DataFrame with a row and this value:

df = pd.DataFrame([{'id': 77, 'data': 1.64847910404205}])

and then iterate over the rows (Okay... the 'row') and inspect:

for index, row in df.iterrows():
    if index > 0:
        previous_row = df.iloc[index]

Of course the above is weird: why would I iterate over the rows just to pull the same row from the DF? Forget that; I removed the -1 to illustrate.

Now, if I use SciView (part of IntelliJ) and the data tab to inspect the rows individually, I get this:

row
data: 1.64847910404205

previous_row
data: 1.64847910404

Notice that previous_row has been rounded. It's because they are for some reason different data types...

row: 
type(row) #float64

previous_row:
type(previous_row) #numpy.float64

I'm curious to know: why does iloc convert to a numpy.float64 and how can I prevent it from doing so?

I need the same level of precision as I will later be doing Peak Signal to Noise Ratio (PSNR) calculations. Of course, I could just convert the float to a numpy.float64, but I don't want to lose precision.

pookie
  • 3,796
  • 6
  • 49
  • 105
  • 1
    It might just be the way it's displayed. What does `row == previous_row` return? – busybear Dec 12 '18 at 21:46
  • @busybear Oh, good call. It does show as being equal. Why would they display differently?The data types are different: is it just the `labels` which are different, while the actual `data` is the same? – pookie Dec 12 '18 at 21:48
  • 1
    Python doesn't have a `float64` builtin object (just `float`), so I don't think they are actually different data types. Perhaps `numpy.float64` was imported as `float64` somewhere. Just speculating. – busybear Dec 12 '18 at 21:58
  • Related: [Numpy float64 vs Python float](https://stackoverflow.com/questions/27098529/numpy-float64-vs-python-float) – jpp Dec 12 '18 at 23:12

1 Answers1

2

The type of the 'data' column in your dataframe is numpy.float64, even if Pandas only reports it as float64. You can prove this to yourself with the following:

df['data'].dtype.type is numpy.float64

which will return True. An alternative form would be:

type(df['data'].values[0]) is numpy.float64

which will also return True.

Any difference in display is down to how SciView is interpreting your code.

tel
  • 13,005
  • 2
  • 44
  • 62