I seem to be running into issues trying to combine numeric (continuous) features with factors. I am using Pandas DataFrames to input to the model. Right now, my code works with factors like 'gender' which can be easily transformed using built-in transformers:
('gender', Pipeline([
('selector', ColumnSelector(column='gender')),
('dict', DictTransformer()),
('vect', DictVectorizer(sparse=False))
]))
But when I try to combine that with a numeric factor (for example, latitude) as follows,
('latitude', Pipeline([
('selector', ColumnSelector(column='latitude')),
('scaler', StandardScaler())
]))
I get an error:
ValueError: all the input arrays must have same number of dimensions
Here is my code for ColumnSelector():
class ColumnSelector(TransformerMixin):
"""
Class for building sklearn Pipeline step. This class should be used to select a column from a pandas data frame.
"""
def __init__(self, column):
self.column = column
def fit(self, x, y=None):
return self
def transform(self, data_frame):
return data_frame[self.column]
Obviously I'm missing something important here. Any ideas?