I'm trying to cluster twitter data using K-means to show the main topics discussed in datasets. I currently have a CSV file which has been cleaned, tokenised and with stop words being removed.
I am now trying to apply k-means through the use of a simple GUI which I wish to eventually visualise the results, it now is able to run but it only creates one cluster with the contents "text". How do I create multiply clusters?
My code:
def k_means_clustering(self):
df = pd.read_csv("test_data.csv")
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df)
true_k = 1
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1)
model.fit(X)
I used this question to try and apply K-means Clustering text documents using scikit-learn kmeans in Python