- I am working with very large dataset that won't fit on my RAM (which is 16GB).
- I noticed that the columns
dTypes
are allfloat64
, but values in the first 10k rows range from-1.0
to+1.0
- To check the full dataset would take too much time
I want to specify the dtype
in the read_csv
for all columns to float16
to reduce the necessary memory:
types = {}
for column in only_first_row_dataframe.columns:
types[column ] = 'float16'
...
dataframe = pd.read_csv(path, engine="c", dtype = types, lowmemory = False)
After running the above code would I be notified that some values didn't fit into the 16 bit float, and therefor some data were lost?
- I am asking this question because I tested only the first 10k rows if they fit into the range
(-1.0, +1.0)
- So I want to be sure I won't lose any data.
- When I do run the code I do not have any warnings and the Dataset is loaded into my RAM, but I am not certain if any data were lost.
- According to this answer I will be notified if there is a major error in
dtypes
, so for example if columnA
will havestring
value at the end but I specified thedtype
asint
. But there is no mention about the problem I am asking here.