name 'DataFrameSelector' is not defined

Question

I am currently reading the "Hands-On Machine Learning with Scikit-Learn & TensorFlow". I get an error when I am trying to recreate the Transformation Pipelines code. How can I fix this?

Code:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
                        ('attribs_adder', CombinedAttributesAdder()),
                        ('std_scaler', StandardScaler()),
                        ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

from sklearn.pipeline import FeatureUnion

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
                         ('selector', DataFrameSelector(num_attribs)),
                         ('imputer', Imputer(strategy = "median")),
                         ('attribs_adder', CombinedAttributesAdder()),
                         ('std_scaler', StandardScaler()),
                        ])

cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)), 
                         ('label_binarizer', LabelBinarizer()),
                        ])

full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline), 
                                                 ("cat_pipeline", cat_pipeline),
                                                ])

# And we can now run the whole pipeline simply:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
     43 
     44 num_pipeline = Pipeline([
---> 45                          ('selector', DataFrameSelector(num_attribs)),
     46                          ('imputer', Imputer(strategy = "median")),
     47                          ('attribs_adder', CombinedAttributesAdder()),

NameError: name 'DataFrameSelector' is not defined

score 21 · Accepted Answer · answered Jan 28 '18 at 21:52

21

DataFrameSelector is not being found, and will need to be imported. It is not part of sklearn, but something of the same name is available in sklearn-features:

from sklearn_features.transformers import DataFrameSelector

(DOCS)

answered Jan 28 '18 at 21:52

Stephen Rauch

47,830
31
106
135

Hey there, thank you for your reply. As I read a bit further, author defines a DataFrameSelector class. It seems to work as I put it above the code I wrote above. But, I get a new error: https://stackoverflow.com/questions/46162855/fit-transform-takes-2-positional-arguments-but-3-were-given-with-labelbinarize – Isaac A Jan 28 '18 at 21:57

score 12 · Answer 2 · answered Apr 16 '18 at 07:58

12

from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names=attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

This should work.

answered Apr 16 '18 at 07:58

dr2509

173
2
7

9

It's sometimes good to reference the source: Hands-On Machine Learning with Scikit-Learn & TensorFlow page 97 – Massoud Aug 07 '18 at 01:30

score 6 · Answer 3 · answered Jul 17 '19 at 14:37

If you are following Hands of Machine learning with Sklearn and Tensorflow, It's on the very next page, A Custom made Dataframe generator

from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

score 1 · Answer 4 · answered May 27 '19 at 13:13

1

from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

It may work.

answered May 27 '19 at 13:13

Navid

91
1
4

You have to add here appropriate imports of BaseEstimator and TransformerMixin. – B. Bohdan Jul 05 '19 at 04:36

score 0 · Answer 5 · answered Aug 11 '21 at 07:08

It looks like you are working on a project California Housing Price Predictions from the book Hands-On Machine Learning with Scikit-learn and TensorFlow.

The error

NameError: name 'DataFrameSelector' is not defined

appeared because the is no DataFrameSelector transformer in sklearn. To overcome this error you need to write your own custom transformer for this.

In the book, you can find DataFrameSelector transformer code on the next page but however I will also copy this code below.

from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

BaseEstimator and TransformerMixin classes are used to inherit fit(), transform() and fit_transform() methods.

Now there is another class DataFrameMapper is also available in sklearn-pandas with the similar objective. You can find detail about this class from the following link:
DataFrameMapper

score 0 · Answer 6 · answered Nov 16 '21 at 20:17

You should insert a cell just before your present code cell and then type the following code

from sklearn.base import BaseEstimator, TransformerMixin

class DataFrameSelector(BaseEstimator, TransformerMixin):

def __init__(self, attribute_names):
    self.attribute_names = attribute_names
def fit(self, X, y=None):
    return self
def transform(self, X, y=None):
    return X[self.attribute_names].values

By this way your DataFrameSelector class will be defined beforehand

name 'DataFrameSelector' is not defined

Code:

Error:

6 Answers6

Linked