I was studying In Depth: k-Means Clustering section from the textbook Jake VanderPlas's Python Data Science Handbook and I came across the following code block:
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
digits = load_digits()
# Project the data: this step will take several seconds
tsne = TSNE(n_components=2, init='random', random_state=0)
digits_proj = tsne.fit_transform(digits.data)
# Compute the clusters
kmeans = KMeans(n_clusters=10, random_state=0)
clusters = kmeans.fit_predict(digits_proj)
This code from the book (I've changed a little bit for clarity of my question, you can view the original code from the first link) runs without any problems. Then I tried to construct a pipeline by sklearn.pipeline.Pipeline
with the models from above. Here is my code:
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
digits = load_digits()
estimators = [('reduce_dim', TSNE(n_components=2, init='random', random_state=0)),
('cls', KMeans(n_clusters=10, random_state=0))]
pipe = Pipeline(estimators)
clusters = pipe.fit_predict(digits.data)
I tried to run my code but every time I got the error message
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'TSNE(init='random', random_state=0)' (type ) doesn't
The thing is TSNE()
step is a transformer and also implements fit and transform. We can see this from the original code from book
tsne = TSNE(n_components=2, init='random', random_state=0)
digits_proj = tsne.fit_transform(digits.data)
But the error message says the opposite. I read scikit-learn's doc (specifically, sklearn.manifold.TSNE and Pipelines and composite estimators) but doc gave nothing to solve my problem. What do you think the problem is?