0

I have a large CSV file with over 200+ columns. Some of the columns are string, some varchar, some integers and some floats.

When i just read my csv file into a pandas dataframe, it is able to detect which are the numerical columns. However, it will give me the specify dtype or low memory error warning.

df = pd.read_csv('myfile.csv')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>200
print len(list(df_not_num))
>>> 10

Then i try to specify a dtype: dtype='unicode' But this causes all my columns to be objects. It is too much manual work to speicfy each dtype per column name when reading the CSV into a dataframe.

pd.read_csv('myfile.csv', dtype = 'unicode')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>>200
print len(list(df_not_num))
>>> 200

So the only way to avoid the low memory warning is to specify a dtype. But how do i specify that i have mixed dtypes for different columns without having to manually specify the dtype of each of the 200 columns?

jxn
  • 7,685
  • 28
  • 90
  • 172
  • Just specifying "mixed types" won't help `read_csv`. You either have to specify particular types for some columns by passing a dict, e.g.: `{‘a’: np.float64, ‘b’: np.int32}` or specify one dtype, which will try to be applied to all columns, or none. Also, there is no "varchar" type in Python. – juanpa.arrivillaga Feb 27 '17 at 23:42
  • Possible duplicate of [Pandas read\_csv low\_memory and dtype options](http://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options) – juanpa.arrivillaga Feb 27 '17 at 23:44

0 Answers0