Different size of array after fit_transform

Question

I have a problem with fit_transform function. Can someone explain why size of array different?

In [5]: X.shape, test.shape

Out[5]: ((1000, 1932), (1000, 1932))

In [6]: from sklearn.feature_selection import VarianceThreshold
        sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
        features = sel.fit_transform(X)
        features_test = sel.fit_transform(test)

In [7]: features.shape, features_test.shape

Out[7]:((1000, 1663), (1000, 1665))

UPD: Which transformation can help me get arrays with same sizes?

For test set, you should NOT apply fit_transform(). Your code should be features_test = sel.transform(test) as you've already figured out. WHY? https://stackoverflow.com/a/63912149/5114585 answers the 'WHY' part of applying fit_transform() or transform() — Dr Nisha Arora, Sep 16 '20 at 02:24

ldirer · Accepted Answer · 2015-08-31T13:57:29.663

7

It is because you are fitting your selector twice.

First, note that fit_transform is just a call to fit followed by a call to transform.

The fit method allows your VarianceThreshold selector to find the features it wants to keep in the dataset based on the parameters you gave it.

The transform method performs the actual feature selection and returns a n array with just the selected features.

edited Aug 31 '15 at 13:57

answered Aug 31 '15 at 13:52

ldirer

6,606
3
24
30

Thank you! I changed ```features = sel.fit(X_small).transform(X_small)``` ```features_test = sel.transform(little_test)``` And it's work. – Gilaztdinov Rustam Aug 31 '15 at 13:57
2

That's the way to go ;). You can still use `fit_transform` for the first step (`features = sel.fit(X_small).transform(X_small)` is equivalent to `features = sel.fit_transform(X_small)`). – ldirer Aug 31 '15 at 13:58

score 0 · Answer 2 · edited May 23 '17 at 12:22

0

Because fit_transform applies a dimensionality reduction on the array. This is why the resulting arrays dimensions are not the same as the input.

See this what is the difference between 'transform' and 'fit_transform' in sklearn and this http://scikit-learn.org/stable/modules/feature_extraction.html

edited May 23 '17 at 12:22

Community

1
1

answered Aug 31 '15 at 12:56

Semih Yagcioglu

4,011
1
26
43

Which transformation can help me get arrays with same sizes? – Gilaztdinov Rustam Aug 31 '15 at 13:44

Different size of array after fit_transform

2 Answers2