4

I was able to replicate the example given in the Github repo. However, when I tried it on my own data, I got the ValueError.

Below is a dummy data that, which gives the same error as my real data.

import pandas as pd
import numpy as np
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler

data = pd.DataFrame({'pet':['cat', 'dog', 'dog', 'fish', 'cat', 'dog','cat','fish'], 'children': [4., 6, 3, 3, 2, 3, 5, 4], 'salary':   [90, 24, 44, 27, 32, 59, 36, 27], 'feat4': ['linear', 'circle', 'linear', 'linear', 'linear', 'circle', 'circle', 'linear']})

mapper = DataFrameMapper([
    (['pet', 'feat4'], LabelEncoder()),
    (['children', 'salary'], [StandardScaler(),
                              MinMaxScaler()])
]) 

np.round(mapper.fit_transform(data.copy()),2)

Below is the error


ValueError Traceback (most recent call last) in () ----> 1 np.round(mapper.fit_transform(data.copy()),2)

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params) 453 if y is None: 454 # fit method of arity 1 (unsupervised transformation) --> 455 return self.fit(X, **fit_params).transform(X) 456 else: 457 # fit method of arity 2 (supervised transformation)

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn_pandas\dataframe_mapper.py in fit(self, X, y) 95 for columns, transformers in self.features: 96 if transformers is not None: ---> 97 transformers.fit(self._get_col_subset(X, columns)) 98 return self 99

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit(self, y) 106 self : returns an instance of self. 107 """ --> 108 y = column_or_1d(y, warn=True) 109 _check_numpy_unicode_bug(y) 110 self.classes_ = np.unique(y)

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py in column_or_1d(y, warn) 549 return np.ravel(y) 550 --> 551 raise ValueError("bad input shape {0}".format(shape)) 552 553

ValueError: bad input shape (8, 2)

Can anyone help?

thanks

wi3o
  • 1,467
  • 3
  • 17
  • 29

1 Answers1

5

You should only submit multiple arrays to a transform if it indeed takes multiple inputs (e.g. sklearn.decomposition.PCA(1) in the documentation). In your case the error ultimately comes from this line:

(['pet', 'feat4'], LabelEncoder()),

Even this does not work:

(['pet', 'feat4'], [LabelEncoder(), LabelEncoder()]),

You instead have to do something like this:

mapper_good = DataFrameMapper([
(['pet'], LabelEncoder()),
(['feat4'], LabelEncoder()),
(['children'],  StandardScaler()),
(['salary'],    MinMaxScaler())
])

np.round(mapper_good.fit_transform(data.copy()),2)
jeff carey
  • 2,313
  • 3
  • 13
  • 17
  • Thanks @jeff carey! I suspected my error was from that line, but couldn't figure out why. I was fixated on the part of the doc that says that you can do multiple columns w/ 1 transformer. I guess that depends on the transformer..good to know! As for the (['children','salary'], [StandardScaler(), MinMaxScaler()]) actually works for multiple columns & multiple transformers in the same tuple (for these transformers at least...). Thanks again! – wi3o Jul 27 '16 at 21:36
  • 1
    Additionally, master branch from github has the feature of applying a default transformer to columns not explicitly listed in the transformer, in case this is useful to you: https://github.com/paulgb/sklearn-pandas#applying-a-default-transformer – dukebody Jul 28 '16 at 09:21