2

Consider the following artificial data:

data = pd.DataFrame({'pet':['cat', 'dog', 'dog', 'fish', 
                            'cat', 'dog', 'cat', 'fish'],
                     'children': [4., 6, 3, 3, 2, 3, 5, 4],
                     'salary':   [90., 24, 44, 27, 32, 59, 36, 27]})

In sklearn ColumnTransformer, I can drop any column I want by specifying 'drop' as the transformer as follows:

clmn_trnsfrmr = ColumnTransformer([
        ('clmn_drpr', 'drop', ['pet'])]),
        ('scale', StandardScaler(), ['salary']),
'passthrough'])

Is there a similar way in sklearn-pandas DataFrameMapper to drop exactly the column I want?

Abhishek Bhatia
  • 547
  • 4
  • 11

1 Answers1

0

The documentation https://pypi.org/project/sklearn-pandas/1.5.0/ says "Only columns that are listed in the DataFrameMapper are kept. To keep a column but don’t apply any transformation to it, use None as transformer", so just don't list the column you want to get rid of.

ctenar
  • 718
  • 5
  • 24
  • 1
    But that won't work if you intend to put `default=None` to passthrough a large number of columns. This would passthrough any columns which is not listed specifically. I could use `default=False` ti drop any non listed columns but it would be infeasible to list all columns and their transformer as None if you have a large number of columns. – Abhishek Bhatia May 13 '20 at 14:40
  • I see. Then why not work on a DataFrame where the irrelevant columns have been dropped already? Or if they are part of other multi-column-transformations, it might be feasible to break up the transformation into several steps. – ctenar May 13 '20 at 15:03
  • 1
    I do this, so that I can easily see the effect of dropping or not dropping a column and applying a transformation to a column in a pipeline. It sort of becomes a hyperparameter for my preprocessing pipeline in a Grid Search. It's just more convenient and useful this way. – Abhishek Bhatia May 13 '20 at 15:14
  • Did we get any solution ? I have also encountered a use case wherein I need to drop some old feature after performing some operations on them to create new features. – Akshay Tilekar Jun 22 '20 at 05:57