Questions tagged [k-means]

k-means is a clustering algorithm, implemented in popular data science tools. Use this tag for questions related to the k-means clustering algorithm itself, or to its use with the tools that implement it (alongside other tags specific to those tools).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3514 questions
22
votes
2 answers

How does pytorch backprop through argmax?

I'm building Kmeans in pytorch using gradient descent on centroid locations, instead of expectation-maximisation. Loss is the sum of square distances of each point to its nearest centroid. To identify which centroid is nearest to each point, I use…
22
votes
6 answers

scikit-learn: Finding the features that contribute to each KMeans cluster

Say you have 10 features you are using to create 3 clusters. Is there a way to see the level of contribution each of the features have for each of the clusters? What I want to be able to say is that for cluster k1, features 1,4,6 were the primary…
cmgerber
  • 2,199
  • 3
  • 16
  • 15
22
votes
7 answers

Can k-means clustering do classification?

I want to know whether the k-means clustering algorithm can do classification? If I have done a simple k-means clustering . Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is…
Sirius Wang
  • 339
  • 1
  • 5
  • 15
21
votes
5 answers

How can I perform K-means clustering on time series data?

How can I do K-means clustering of time series data? I understand how this works when the input data is a set of points, but I don't know how to cluster a time series with 1XM, where M is the data length. In particular, I'm not sure how to update…
Jaz
  • 581
  • 2
  • 6
  • 10
21
votes
3 answers

How would I implement k-means with TensorFlow?

The intro tutorial, which uses the built-in gradient descent optimizer, makes a lot of sense. However, k-means isn't just something I can plug into gradient descent. It seems like I'd have to write my own sort of optimizer, but I'm not quite sure…
19
votes
4 answers

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

I am trying to calculate silhouette score as I find the optimal number of clusters to create, but get an error that says: ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) I am unable to understand the reason for…
Suhail Gupta
  • 22,386
  • 64
  • 200
  • 328
19
votes
1 answer

How to add k-means predicted clusters in a column to a dataframe in Python

I have a question about kmeans clustering in python. So I did the analysis that way: from sklearn.cluster import KMeans km = KMeans(n_clusters=12, random_state=1) new =…
Keithx
  • 2,994
  • 15
  • 42
  • 71
19
votes
3 answers

plot a document tfidf 2D graph

I would like to plot a 2d graph with the x-axis as term and y-axis as TFIDF score (or document id) for my list of sentences. I used scikit learn's fit_transform() to get the scipy matrix but i do not know how to use that matrix to plot the graph. I…
jxn
  • 7,685
  • 28
  • 90
  • 172
19
votes
2 answers

Clustering geo location coordinates (lat,long pairs) using KMeans algorithm with Python

Using the following code to cluster geolocation coordinates results in 3 clusters: import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2, whiten coordinates= np.array([ [lat, long], …
rok
  • 9,403
  • 17
  • 70
  • 126
19
votes
5 answers

How to calculate BIC for k-means clustering in R

I've been using k-means to cluster my data in R but I'd like to be able to assess the fit vs. model complexity of my clustering using Baysiean Information Criterion (BIC) and AIC. Currently the code I've been using in R is: KClData <- kmeans(Data,…
UnivStudent
  • 402
  • 1
  • 3
  • 11
18
votes
3 answers

OpenCV using k-means to posterize an image

I want to posterize an image with k-means and OpenCV in C++ interface (cv namespace) and I get weird results. I need it for reduce some noise. This is my code: #include "cv.h" #include "highgui.h" using namespace cv; int main() { Mat imageBGR,…
nkint
  • 11,513
  • 31
  • 103
  • 174
18
votes
7 answers

setting an array element with a sequence requested array has an inhomogeneous shape after 1 dimensions The detected shape was (2,)+inhomogeneous part

import os import numpy as np from scipy.signal import * import csv import matplotlib.pyplot as plt from scipy import signal from brainflow.board_shim import BoardShim, BrainFlowInputParams, LogLevels, BoardIds from brainflow.data_filter import…
ILovePhysics
  • 313
  • 1
  • 2
  • 7
17
votes
4 answers

Can I use K-means algorithm on a string?

I am working on a python project where I study RNA structure evolution (represented as a string for example: "(((...)))" where the parenthesis represent basepairs). The point being is that I have an ideal structure and a population that evolves…
Doni
  • 173
  • 1
  • 1
  • 4
17
votes
2 answers

KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values). I want to extract 7 clusters based on just those 2 columns and then I want to…
17
votes
2 answers

How to set k-Means clustering labels from highest to lowest with Python?

I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results. First…
Sergio
  • 357
  • 1
  • 3
  • 9