I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 or from float64 to float32.
How to find out if the information in dataframe will be lost or not in conversion?
I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 or from float64 to float32.
How to find out if the information in dataframe will be lost or not in conversion?
I have a dataframe which is huge(8 gb). I am trying to find if i will loose any information if i downsize the columns from int64 to int32 ...
The simplest way to cast integers to a smaller type and make sure that you are not losing information is to use
df['col'] = pd.to_numeric(df['col'], downcast='integer')
This will both do the conversion, and check that the conversion didn't lose data. You'll need to do that for each integer column in your dataframe.
... or from float64 to float32.
Casting a number to a smaller floating point number always loses some information, unless you are dealing with an exact binary fraction. In practice, you can use 32-bit float if you need around 7 digits or fewer of precision.
You can easily check integers, e.g.:
df.abs().ge(2 ** 32).to_numpy().any()
If it's False
, you're safe. Otherwise you might have to check column-wise and handle accordingly.
Like @anon01 mentioned, when it comes to float
s it really depends on your use case.