I have finished implementing the traditional k-means text clustering. However, right now, I need to revise my program to "spherical k-means text clustering" but have not succeeded yet.
I've searched for solutions on sites but still cannot revise my program successfully. The followings are the resources that should be helpful with my project but I still cannot figure out a way yet.
This is my traditional K-means program:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
from sklearn.externals import joblib #store model
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(tag_document) //tag_document is a list that contains many strings
true_k = 3 //assume that i want to have 3 clusters
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1)
model.fit(X)
#store
joblib.dump(model,'save/cluster.pkl')
#restore
clu2 = joblib.load('save/cluster.pkl')
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
terms = vectorizer.get_feature_names()
I expect to cluster text documents with "spherical k-means clustering".