1

I was studying In Depth: k-Means Clustering section from the textbook Jake VanderPlas's Python Data Science Handbook and I came across the following code block:

from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans

digits = load_digits()

# Project the data: this step will take several seconds
tsne = TSNE(n_components=2, init='random', random_state=0)
digits_proj = tsne.fit_transform(digits.data)

# Compute the clusters
kmeans = KMeans(n_clusters=10, random_state=0)
clusters = kmeans.fit_predict(digits_proj)

This code from the book (I've changed a little bit for clarity of my question, you can view the original code from the first link) runs without any problems. Then I tried to construct a pipeline by sklearn.pipeline.Pipeline with the models from above. Here is my code:

from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline

digits = load_digits()

estimators = [('reduce_dim', TSNE(n_components=2, init='random', random_state=0)),
              ('cls', KMeans(n_clusters=10, random_state=0))]

pipe = Pipeline(estimators)

clusters = pipe.fit_predict(digits.data)

I tried to run my code but every time I got the error message

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'TSNE(init='random', random_state=0)' (type ) doesn't

The thing is TSNE() step is a transformer and also implements fit and transform. We can see this from the original code from book

tsne = TSNE(n_components=2, init='random', random_state=0)
digits_proj = tsne.fit_transform(digits.data)

But the error message says the opposite. I read scikit-learn's doc (specifically, sklearn.manifold.TSNE and Pipelines and composite estimators) but doc gave nothing to solve my problem. What do you think the problem is?

0 Answers0