Questions tagged [k-means]

k-means is a clustering algorithm, implemented in popular data science tools. Use this tag for questions related to the k-means clustering algorithm itself, or to its use with the tools that implement it (alongside other tags specific to those tools).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3514 questions
16
votes
2 answers

How to detect multiple objects with OpenCV in C++?

I got inspiration from this answer here, which is a Python implementation, but I need C++, that answer works very well, I got the thought is that: detectAndCompute to get keypoints, use kmeans to segment them to clusters, then for each cluster do…
Suge
  • 2,808
  • 3
  • 48
  • 79
16
votes
2 answers

How to use silhouette score in k-means clustering from sklearn library?

I'd like to use silhouette score in my script, to automatically compute number of clusters in k-means clustering from sklearn. import numpy as np import pandas as pd import csv from sklearn.cluster import KMeans from sklearn.metrics import…
16
votes
1 answer

Using a smoother with the L Method to determine the number of K-Means clusters

Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater…
winwaed
  • 7,645
  • 6
  • 36
  • 81
16
votes
1 answer

Using dplyr and broom to compute kmeans on a training and test set

I am using dplyr and broom to compute kmeans for my data. My data contains a test and training set of X and Y coordinates and are grouped by a some parameter value (lambda in this case): mds.test = data.frame() for(l in seq(0.1, 0.9, by=0.2)) { …
user2117258
  • 515
  • 4
  • 18
15
votes
1 answer

initial centroids for scikit-learn kmeans clustering

if I already have a numpy array that can serve as the initial centroids, how can I properly initialize the kmeans algorithm? I am using the scikit-learn Kmeans class this post (k-means with selected initial centers) indicates that I only need to set…
webmaker
  • 456
  • 1
  • 5
  • 15
15
votes
2 answers

How to identify Cluster labels in kmeans scikit learn

I am learning python scikit. The example given here displays the top occurring words in each Cluster and not Cluster name. http://scikit-learn.org/stable/auto_examples/document_clustering.html I found that the km object has "km.label" which lists…
vij555
  • 329
  • 1
  • 2
  • 10
15
votes
1 answer

What is the difference between SOM (Self Organizing Maps) and K-Means?

There is only one question related to this in stackoverflow, and it is more about which one is better. I just dont really understand the difference. I mean they both work with vectors, which are assigned randomly to clusters, they both work with the…
15
votes
8 answers

k-means empty cluster

I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers: During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data…
toobee
  • 2,592
  • 4
  • 26
  • 35
14
votes
1 answer

How to specify distance metric while for kmeans in R?

I'm doing kmeans clustering in R with two requirements: I need to specify my own distance function, now it's Pearson Coefficient. I want to do the clustering that uses average of group members as centroids, rather some actual member. The reason for…
Derrick Zhang
  • 21,201
  • 18
  • 53
  • 73
14
votes
4 answers

Reading wav file in Java

I want to read wav files in Java and I am going to classify them with K-means. How can I read wav files in Java and assign them into an array or something like that(you can suggest ideas for it) to classify them? EDIT: I want to use APIs for reading…
kamaci
  • 72,915
  • 69
  • 228
  • 366
14
votes
1 answer

k-means with selected initial centers

I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers: init : {‘k-means++’, ‘random’ or an ndarray} If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives…
lel
  • 163
  • 1
  • 1
  • 11
14
votes
1 answer

How to Find Documents That are in the same Cluster with KMeans

I have clustered various articles together with the Scikit-learn framework. Below are the top 15 words in each cluster: Cluster 0: whales islands seaworld hurricane whale odile storm tropical kph mph pacific mexico orca coast cabos Cluster 1: ebola…
Stunner
  • 12,025
  • 12
  • 86
  • 145
13
votes
2 answers

k-means return value in R

I am using the kmeans() function in R and I was curious what is the difference between the totss and tot.withinss attributes of the returned object. From the documentation they seem to be returning the same thing, but applied on my dataset the value…
Marius
  • 990
  • 1
  • 14
  • 34
13
votes
3 answers

Where to find a reliable K-medoid(Not k-means) open source software/tool?

I am learning the K-medoids algorithm so I am sorry if I ask inappropriate questions. As I know,the K-medoids algorithm implements a K-means clustering but use actual data points to be centroid instead of mathematical calculated means. As I googled…
Cassie
  • 1,179
  • 6
  • 18
  • 30
13
votes
2 answers

Weka simple K-means clustering assignments

I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry. I am using…
machine yearning
  • 9,889
  • 5
  • 38
  • 51