Im trying to downcast columns of a csv in the process of reading it, because doing it after reading the file is too time consuming. So far so good. The problem occurs of course if one column has NA values. Is there any possiblity to ignore that or to filter those in the process of reading maybe with that converter input of pandas read csv? And what does the argument 'verbose' do? The documentation says something about Indicate number of NA values placed in non-numeric columns.
My approach for downcasting so far is to read the first two rows and gues the dtype. I create a mapping dict for the dtype argument when reading the whole csv. Ofcourse NaN values can occur in the rows later on. So there is where mixed dtypes can occur:
import pandas as pd
df = pd.read_csv(filePath, delimiter=delimiter, nrows=2, low_memory=True, memory_map=True,engine='c')
if downcast == True:
mapdtypes = {'int64': 'int8', 'float64': 'float32'}
dtypes = list(df.dtypes.apply(str).replace(mapdtypes))
dtype = {key: value for (key, value) in enumerate(dtypes)}
df = pd.read_csv(filePath, delimiter=delimiter, memory_map=True,engine='c', low_memory=True, dtype=dtype)