0

So I have this vector, and I want to cluster them with simple K-Means clustering, but first, I need to look for the optimum k-cluster with the Elbow method. I use the KElbowVisualizer function from the YellowBrick package to find the optimum k-cluster. The problem is that I have 569 vectors, and the KElbowVisualizer plot was not big enough to visualize them; thus, I cannot see which best k-cluster there is.

I did look for the code to set the plot size, but it didn't work. Here is the plot result: enter image description here

and here is my code:

from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import TfidfVectorizer
from yellowbrick.cluster import KElbowVisualizer

vec = TfidfVectorizer(
      stop_words = 'english',
      use_idf=True
)

vectors_= vec.fit_transform(df1)

model = MiniBatchKMeans()
titleKElbow = "The Optimal K-Cluster with Elbow Method"
visualizer = KElbowVisualizer(model, k=(2,30), metric='silhouette', timings=False, title = titleKElbow, size=(1080, 720))
visualizer.fit(vectors_)
visualizer.show(outpath="G:/My Drive/0. Thesis/Results/kelbow_minibatchkmeans.pdf")

I could not even save it to my directory with the last line of my code. Does anybody have any idea how to fix it? Thanks

  • Are you on the latest version of Yellowbrick? `pip install -U yellowbrick` – rebeccabilbro Feb 06 '20 at 13:17
  • it works now @rebeccabilbro. But can we know the exact number of the x-axis? because I have 569 rows, which is I will have 569 numbers on the x-axis. – Jack Zaki Zakiul Fahmi Jailani Feb 06 '20 at 15:30
  • Yes, assuming you have instantiated your `KElbowVisualizer` using the parameter `locate_elbow=True`, once you have called `visualizer.fit()` you can retrieve the best k value and the score at that k using `visualizer.elbow_value_` and `visualizer.elbow_score_`, respectively – rebeccabilbro Feb 12 '20 at 20:14

1 Answers1

0

answer: just install the latest version of Yellowbrick with pip install -U yellowbrick.

dont forget to set the size of the KElbowVisualizer plot so you can see the optimum k-cluster in detail