0

I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 or from float64 to float32.

How to find out if the information in dataframe will be lost or not in conversion?

  • 1
    your float64s won't be float32s by accident. Whether you need all that precision depends on what its used for – anon01 Feb 08 '22 at 01:48
  • is there any way to determine if i am loosing precision or not by converting? – Hari Upadrasta Feb 08 '22 at 01:53
  • 1
    https://stackoverflow.com/a/65842338/16746253 – StevenS Feb 08 '22 at 02:10
  • Good one @StevenS, did not now about that. Voting to close then. – fsl Feb 08 '22 at 02:17
  • Worth mentioning that string columns can be very expensive in terms of memory. The [df.memory_usage()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.memory_usage.html) function is very helpful for diagnosing memory problems. – Nick ODell Feb 08 '22 at 02:19

2 Answers2

1

I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 ...

The simplest way to cast integers to a smaller type and make sure that you are not losing information is to use

df['col'] = pd.to_numeric(df['col'], downcast='integer')

This will both do the conversion, and check that the conversion didn't lose data. You'll need to do that for each integer column in your dataframe.

... or from float64 to float32.

Casting a number to a smaller floating point number always loses some information, unless you are dealing with an exact binary fraction. In practice, you can use 32-bit float if you need around 7 digits or fewer of precision.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
0

You can easily check integers, e.g.:

df.abs().ge(2 ** 32).to_numpy().any()

If it's False, you're safe. Otherwise you might have to check column-wise and handle accordingly.

Like @anon01 mentioned, when it comes to floats it really depends on your use case.

fsl
  • 3,250
  • 1
  • 10
  • 20