Questions tagged [k-means]

k-means is a clustering algorithm, implemented in popular data science tools. Use this tag for questions related to the k-means clustering algorithm itself, or to its use with the tools that implement it (alongside other tags specific to those tools).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3514 questions

vote

1 answer

Index out of Range in Spark MLLIB K-means with TFIDF for Text clutsering

I am trying to run k-means using spark MLlib but I am getting Index out of range error. I've splitted my very small sample input file and the output is like this:- ['hello', 'world', 'this', 'is', 'earth'] ['what', 'are', 'you', 'trying', 'to',…

asked Oct 06 '15 at 19:02

Nicky

vote

1 answer

Weighting specific features in TF-IDF feature vectors for k-means clustering and cosine similarity

I have an array of TF-IDF feature vectors. I'd like to find similar vectors in the array using two methods: Cosine similarity k-means clustering Using Scikit Learn, this process is pretty simple. Now I'd like to weight certain features so that…

python machine-learning scikit-learn k-means tf-idf

asked Sep 22 '15 at 14:17

Andrew LaPrise

3,373
4
32
50

vote

1 answer

Data normalization for K-Means algorithm

I want clustered my data using K-Means algorithm for this my data should be normalized I don't know which method of normalization is better for this algorithm? (min-max or z-transformation or decimal or...)rapid miner normalized data with…

cluster-analysis normalization k-means

asked Sep 22 '15 at 05:58

Nervin

vote

0 answers

Get center coordinate of grid blob

I have a 2d grid that changes every hour with this layout: The gridshape is about 765*700 and it contains several of this data clusters. Now we need to find those clusters and get the center pixel coordinate of it. The problem with K-means is that…

python numpy k-means

asked Jul 28 '15 at 15:49

Olivier vd Sloot

vote

1 answer

I want to add a "spheres" to my data cluster

I want to add a kind of "spheres" to my data cluster. My data cluster is this, which does not have ""spheres". And this is my code import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use('ggplot') import pandas as…

pandas matplotlib scikit-learn k-means

asked Jul 10 '15 at 13:05

Suzuki Soma

vote

1 answer

How to create n-dimensional test data for cluster analysis?

I'm working on a C++ implementation of k-means and therefore I need n-dimensional test data. For the beginning 2D points are sufficient, since they can be visualized easily in a 2D image, but I'd finally prefer a general approach that supports n…

c++ algorithm c++11 cluster-analysis k-means

asked Jul 08 '15 at 10:13

mike

4,929
4
40
80

vote

2 answers

image segmentation of RGB image by K means clustering in python

I want to segment RGB images(satellite imagery) for land cover using k means clustering in such a fashion that the different regions of the image are marked by different colors and if possible boundaries are created separating different regions.…

python image k-means

asked Jul 01 '15 at 06:43

RachJain

vote

2 answers

Initialize kmeans, vector initial centroids, R

In this post there is a method to initialize the centers for the K-means algorithm in R. However, the data used therein is scalar (i.e. numbers). A variation on this question: what if the data has multiple dimensions. In that case, the new centers…

r k-means

asked Jun 30 '15 at 18:09

JCBR

vote

1 answer

R loops and data.frame

A part of the code is sse <-c() k <- c() for (i in seq(3, 15, 1)) { y_pred <-knn(train = newdata.training, test = newdata.test, cl = newdata.trainLabels, k=i) pred_y <-…

r algorithm k-means

asked Jun 28 '15 at 17:01

R. hacker

vote

2 answers

Is the Streaming k-means clustering predefined in MLlib library of spark supervised or unsupervised?

I know that k-means clustering is the one of simplest unsupervised learning algorithm. Looking at the source code of streaming k-means clustering packaged in MLlib, I find the terms: training data, test data, predict, and train. This makes me think…

scala machine-learning apache-spark k-means spark-streaming

asked Jun 22 '15 at 04:16

Nina Queen

vote

1 answer

Clustering based on pearson correlation

I have a use case where I have traffic data for every 15 minutes for 1 month. This data is collected for various resources in netwrok. Now I need to group resources which are similar(based on traffic usage pattern over 00 hours to 23:45 hrs). One…

cluster-analysis data-mining k-means hierarchical-clustering dbscan

asked Jun 11 '15 at 10:36

Bankelaal

vote

0 answers

Running k means clusterin code source (mlib) in apache spark and java

I want to run this source code of K-Means clustering (mlib) on Spark 1.3.1 with Java : import java.util.regex.Pattern; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import…

java hadoop apache-spark k-means spark-streaming

asked Jun 07 '15 at 22:27

Marco Marco

vote

0 answers

How do I implement k-means clustering in a CIBR system?

I'm trying to perform content-based image retrieval (CBIR) using k-means clustering. I use the PCA function princomp() with a feature vector length of 190. I have 500 test images in color taken from here. There's 5 categories in total. When I run my…

matlab k-means cbir

asked Jun 06 '15 at 15:37

zero one

vote

0 answers

kmeans:Required argument 'flags' (pos 6) not found

I want to compute the lbp's cluster by kmeans. first I compute the lbp value in size 8 * 8 every image. then I use the cv2.kmeans,but It can't work the error:Required argument 'flags' (pos 6) not found code: # -*- coding: utf-8 -*- """ Spyder…

python k-means flags

asked May 28 '15 at 07:35

user3960019

vote

2 answers

Kmeans clustering on different distance function in Lab space

Problem: To cluster the similar colour pixels in CIE LAB using K means. I want to use CIE 94 for distance between 2 pixels Formula of CIE94 What i read was Kmeans work in "Euclidean space" where the positional cordinates are minimised by cost…

image-processing data-mining k-means hierarchical-clustering color-space

asked May 27 '15 at 13:26

pg20

Prev 1 2 3

…

100 Next