12

How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes for numeric columns. For example:

   A        B    C         D
0  1  1000000  1.1  1.111111
1  2 -1000000  2.1  2.111111

>>> df.dtypes
A      int64
B      int64
C    float64
D    float64

Expected result:

>>> df.dtypes
A       int8
B      int32
C    float32
D    float32
dtype: object
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73

1 Answers1

23

You can use parameter downcast in to_numeric with selectig integers and floats columns by DataFrame.select_dtypes, it working from pandas 0.19+ like mentioned @anurag, thank you:

fcols = df.select_dtypes('float').columns
icols = df.select_dtypes('integer').columns

df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')

print (df.dtypes)
A       int8
B      int32
C    float32
D    float32
dtype: object
marcelovca90
  • 2,673
  • 3
  • 27
  • 34
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Just a link to [official pandas doc](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.19.0.html?highlight=downcast#downcast-values-to-smallest-possible-dtype-in-to-numeric) for downcasting – anurag Jan 22 '21 at 09:23