I am in the process of reducing the memory usage of my code. The goal of this code is handling some big dataset. Those are stored in Pandas dataframe if that is relevant.
Among many other data there are some small integers. As they contain some missing values (NA) Python has them set to the float64 type by default. I was trying to downcast them to some smaller int format (int8 or int16 for exemple), but I got an error because of the NA.
It seems that there are some new integer type (Int64) that can handle missing values but wouldn't help for the memory usage. I gave some tought about using a category, but I am not sure this will not create a bottleneck further down the pipeline. Downcasting float64 to float32 seems to be my main option for reducing memory usage (rounding error do not really matter for my usage).
Do I have a better option to reduce memory consumption of handling small integers with missing values ?