Questions tagged [k-means]

k-means is a clustering algorithm, implemented in popular data science tools. Use this tag for questions related to the k-means clustering algorithm itself, or to its use with the tools that implement it (alongside other tags specific to those tools).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3514 questions

votes

1 answer

Cluster one-dimensional data optimally?

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works? Or: what is the most optimal way to do k-means clustering in one-dimension?

r cluster-analysis k-means

asked Oct 23 '11 at 22:12

Laciel

votes

3 answers

Understanding "score" returned by scikit-learn KMeans

I applied clustering on a set of text documents (about 100). I converted them to Tfidf vectors using TfIdfVectorizer and supplied the vectors as input to scikitlearn.cluster.KMeans(n_clusters=2, init='k-means++', max_iter=100, n_init=10). Now when…

python scikit-learn k-means

asked Sep 03 '15 at 08:23

Prateek Dewan

1,587
3
16
29

votes

2 answers

Scikit-learn: How to run KMeans on a one-dimensional array?

I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array…

python scikit-learn data-mining k-means

asked Feb 09 '15 at 18:08

Irene

votes

4 answers

whats is the difference between "k means" and "fuzzy c means" objective functions?

I am trying to see if the performance of both can be compared based on the objective functions they work on?

cluster-analysis k-means fuzzy-c-means

asked Feb 27 '10 at 01:37

n0ob

1,275
8
20
23

votes

1 answer

Online k-means clustering

Is there a online version of the k-Means clustering algorithm? By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time. I have wrote one my self with…

cluster-analysis k-means

asked Sep 13 '10 at 07:33

Theodor

5,536
15
41
55

votes

5 answers

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving the following error: Error in do_one(nmeth) : NA/NaN/Inf in…

r machine-learning cluster-analysis data-mining k-means

asked Apr 07 '16 at 07:40

Jonathan Rhein

1,616
3
23
47

votes

6 answers

Fast (< n^2) clustering algorithm

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a specified radius). That means that there probably has…

algorithm machine-learning cluster-analysis data-mining k-means

asked Dec 09 '10 at 23:11

John Hawksley

votes

1 answer

Clustering text documents using scikit-learn kmeans in Python

I need to implement scikit-learn's kMeans for clustering text documents. The example code works fine as it is but takes some 20newsgroups data as input. I want to use the same code for clustering a list of documents as shown below: documents =…

python python-2.7 scikit-learn cluster-analysis k-means

asked Jan 11 '15 at 17:20

Nabila Shahid

votes

2 answers

Estimation of number of Clusters via gap statistics and prediction strength

I am trying to translate the R implementations of gap statistics and prediction strength http://edchedch.wordpress.com/2011/03/19/counting-clusters/ into python scripts for the estimation of number of clusters in iris data with 3 clusters. Instead…

python r cluster-analysis k-means

asked Jan 08 '14 at 17:39

Riyaz

1,430
2
17
27

votes

2 answers

What is the time complexity of k-means?

I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i) (n = total elements, k = number of cluster iteration) So can someone explain me this statement from Wikipedia and how is this NP hard? If k…

algorithm time-complexity k-means

asked Sep 05 '13 at 10:41

parallel

votes

2 answers

Group n points in k clusters of equal size

Possible Duplicate: K-means algorithm variation with equal cluster size EDIT: like casperOne point it out to me this question is a duplicate. Anyways here is a more generalized question that cover this one:…

algorithm cluster-analysis k-means

asked Jan 09 '12 at 23:30

Pierre-David Belanger

1,004
1
11
19

votes

2 answers

K-Means: Lloyd,Forgy,MacQueen,Hartigan-Wong

I'm working with the K-Means Algorithm in R and I want to figure out the differences of the 4 Algorithms Lloyd,Forgy,MacQueen and Hartigan-Wong which are available for the function "kmeans" in the stats package. However I was notable to get a…

r algorithm k-means

asked Dec 07 '13 at 20:11

user2974776

votes

4 answers

Changes of clustering results after each time run in Python scikit-learn

I have a bunch of sentences and I want to cluster them using scikit-learn spectral clustering. I've run the code and get the results with no problem. But, every time I run it I get different results. I know this is the problem with initiation but I…

python scikit-learn cluster-analysis k-means spectral

asked Sep 18 '14 at 20:28

user3430235

votes

3 answers

Using K-means with cosine similarity - Python

I am trying to implement Kmeans algorithm in python which will use cosine distance instead of euclidean distance as distance metric. I understand that using different distance function can be fatal and should done carefully. Using cosine distance…

python scikit-learn k-means cosine-similarity sklearn-pandas

asked Sep 25 '17 at 16:22

ise372

votes

3 answers

kmeans scatter plot: plot different colors per cluster

I am trying to do a scatter plot of a kmeans output which clusters sentences of the same topic together. The problem i am facing is plotting points that belongs to each cluster a certain color. sentence_list=["Hi how are you", "Good morning" ...] #i…

python numpy matplotlib scipy k-means

asked Jan 30 '15 at 00:36

jxn

7,685
28
90
172

Prev 1

…

99 100 Next