13

I am currently reading the "Hands-On Machine Learning with Scikit-Learn & TensorFlow". I get an error when I am trying to recreate the Transformation Pipelines code. How can I fix this?

Code:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
                        ('attribs_adder', CombinedAttributesAdder()),
                        ('std_scaler', StandardScaler()),
                        ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

from sklearn.pipeline import FeatureUnion

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
                         ('selector', DataFrameSelector(num_attribs)),
                         ('imputer', Imputer(strategy = "median")),
                         ('attribs_adder', CombinedAttributesAdder()),
                         ('std_scaler', StandardScaler()),
                        ])

cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)), 
                         ('label_binarizer', LabelBinarizer()),
                        ])

full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline), 
                                                 ("cat_pipeline", cat_pipeline),
                                                ])

# And we can now run the whole pipeline simply:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
     43 
     44 num_pipeline = Pipeline([
---> 45                          ('selector', DataFrameSelector(num_attribs)),
     46                          ('imputer', Imputer(strategy = "median")),
     47                          ('attribs_adder', CombinedAttributesAdder()),

NameError: name 'DataFrameSelector' is not defined
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Isaac A
  • 543
  • 1
  • 6
  • 18

6 Answers6

21

DataFrameSelector is not being found, and will need to be imported. It is not part of sklearn, but something of the same name is available in sklearn-features:

from sklearn_features.transformers import DataFrameSelector

(DOCS)

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • Hey there, thank you for your reply. As I read a bit further, author defines a DataFrameSelector class. It seems to work as I put it above the code I wrote above. But, I get a new error: https://stackoverflow.com/questions/46162855/fit-transform-takes-2-positional-arguments-but-3-were-given-with-labelbinarize – Isaac A Jan 28 '18 at 21:57
12
from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names=attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

This should work.

dr2509
  • 173
  • 2
  • 7
  • 9
    It's sometimes good to reference the source: Hands-On Machine Learning with Scikit-Learn & TensorFlow page 97 – Massoud Aug 07 '18 at 01:30
6

If you are following Hands of Machine learning with Sklearn and Tensorflow, It's on the very next page, A Custom made Dataframe generator

from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values
Shafay
  • 187
  • 2
  • 7
1
from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

It may work.

Navid
  • 91
  • 1
  • 4
0

It looks like you are working on a project California Housing Price Predictions from the book Hands-On Machine Learning with Scikit-learn and TensorFlow.

The error

NameError: name 'DataFrameSelector' is not defined

appeared because the is no DataFrameSelector transformer in sklearn. To overcome this error you need to write your own custom transformer for this.

In the book, you can find DataFrameSelector transformer code on the next page but however I will also copy this code below.

from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

BaseEstimator and TransformerMixin classes are used to inherit fit(), transform() and fit_transform() methods.

Now there is another class DataFrameMapper is also available in sklearn-pandas with the similar objective. You can find detail about this class from the following link:
DataFrameMapper

0

You should insert a cell just before your present code cell and then type the following code

from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):

def __init__(self, attribute_names):
    self.attribute_names = attribute_names
def fit(self, X, y=None):
    return self
def transform(self, X, y=None):
    return X[self.attribute_names].values   

By this way your DataFrameSelector class will be defined beforehand