I have a CSV file with different types of data. For example: Some columns are categorical (e.g. name of city) Some are numerical (e.g. price of a product)
I would like to read the data file using Python 3 in such a way that all the categorical data are 1-hot encoded and the numerical data are simply encoded as a scalar value.
Something like this:
import numpy as np
x = np.loadtxt(d, dtype={'names': ('city', 'price')
'formats': (string, int)})
But here I want to one-hot encode the 'city' column as well.
Is there any dataloader/preprocessor in numpy/pandas/scikit that will help read the csv as well as 1-hot encode some of the columns as well?