I have a large CSV file with over 200+ columns. Some of the columns are string, some varchar, some integers and some floats.
When i just read my csv file into a pandas dataframe, it is able to detect which are the numerical columns. However, it will give me the specify dtype or low memory error
warning.
df = pd.read_csv('myfile.csv')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>200
print len(list(df_not_num))
>>> 10
Then i try to specify a dtype: dtype='unicode'
But this causes all my columns to be objects.
It is too much manual work to speicfy each dtype per column name when reading the CSV into a dataframe.
pd.read_csv('myfile.csv', dtype = 'unicode')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>>200
print len(list(df_not_num))
>>> 200
So the only way to avoid the low memory warning is to specify a dtype
. But how do i specify that i have mixed dtypes for different columns without having to manually specify the dtype of each of the 200 columns?