Questions tagged [k-means]

k-means is a clustering algorithm, implemented in popular data science tools. Use this tag for questions related to the k-means clustering algorithm itself, or to its use with the tools that implement it (alongside other tags specific to those tools).

In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean by least-squared deviations.

For detailed info check Wikipedia entry at http://en.wikipedia.org/wiki/K-means_clustering

3514 questions
1
vote
1 answer

Index out of Range in Spark MLLIB K-means with TFIDF for Text clutsering

I am trying to run k-means using spark MLlib but I am getting Index out of range error. I've splitted my very small sample input file and the output is like this:- ['hello', 'world', 'this', 'is', 'earth'] ['what', 'are', 'you', 'trying', 'to',…
Nicky
  • 333
  • 2
  • 4
  • 11
1
vote
1 answer

Weighting specific features in TF-IDF feature vectors for k-means clustering and cosine similarity

I have an array of TF-IDF feature vectors. I'd like to find similar vectors in the array using two methods: Cosine similarity k-means clustering Using Scikit Learn, this process is pretty simple. Now I'd like to weight certain features so that…
Andrew LaPrise
  • 3,373
  • 4
  • 32
  • 50
1
vote
1 answer

Data normalization for K-Means algorithm

I want clustered my data using K-Means algorithm for this my data should be normalized I don't know which method of normalization is better for this algorithm? (min-max or z-transformation or decimal or...)rapid miner normalized data with…
Nervin
  • 41
  • 1
  • 6
1
vote
0 answers

Get center coordinate of grid blob

I have a 2d grid that changes every hour with this layout: The gridshape is about 765*700 and it contains several of this data clusters. Now we need to find those clusters and get the center pixel coordinate of it. The problem with K-means is that…
1
vote
1 answer

I want to add a "spheres" to my data cluster

I want to add a kind of "spheres" to my data cluster. My data cluster is this, which does not have ""spheres". And this is my code import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use('ggplot') import pandas as…
Suzuki Soma
  • 519
  • 1
  • 8
  • 16
1
vote
1 answer

How to create n-dimensional test data for cluster analysis?

I'm working on a C++ implementation of k-means and therefore I need n-dimensional test data. For the beginning 2D points are sufficient, since they can be visualized easily in a 2D image, but I'd finally prefer a general approach that supports n…
mike
  • 4,929
  • 4
  • 40
  • 80
1
vote
2 answers

image segmentation of RGB image by K means clustering in python

I want to segment RGB images(satellite imagery) for land cover using k means clustering in such a fashion that the different regions of the image are marked by different colors and if possible boundaries are created separating different regions.…
RachJain
  • 283
  • 1
  • 5
  • 14
1
vote
2 answers

Initialize kmeans, *vector* initial centroids, R

In this post there is a method to initialize the centers for the K-means algorithm in R. However, the data used therein is scalar (i.e. numbers). A variation on this question: what if the data has multiple dimensions. In that case, the new centers…
JCBR
  • 21
  • 1
  • 5
1
vote
1 answer

R loops and data.frame

A part of the code is sse <-c() k <- c() for (i in seq(3, 15, 1)) { y_pred <-knn(train = newdata.training, test = newdata.test, cl = newdata.trainLabels, k=i) pred_y <-…
R. hacker
  • 11
  • 1
1
vote
2 answers

Is the Streaming k-means clustering predefined in MLlib library of spark supervised or unsupervised?

I know that k-means clustering is the one of simplest unsupervised learning algorithm. Looking at the source code of streaming k-means clustering packaged in MLlib, I find the terms: training data, test data, predict, and train. This makes me think…
1
vote
1 answer

Clustering based on pearson correlation

I have a use case where I have traffic data for every 15 minutes for 1 month. This data is collected for various resources in netwrok. Now I need to group resources which are similar(based on traffic usage pattern over 00 hours to 23:45 hrs). One…
1
vote
0 answers

Running k means clusterin code source (mlib) in apache spark and java

I want to run this source code of K-Means clustering (mlib) on Spark 1.3.1 with Java : import java.util.regex.Pattern; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import…
1
vote
0 answers

How do I implement k-means clustering in a CIBR system?

I'm trying to perform content-based image retrieval (CBIR) using k-means clustering. I use the PCA function princomp() with a feature vector length of 190. I have 500 test images in color taken from here. There's 5 categories in total. When I run my…
zero one
  • 19
  • 2
1
vote
0 answers

kmeans:Required argument 'flags' (pos 6) not found

I want to compute the lbp's cluster by kmeans. first I compute the lbp value in size 8 * 8 every image. then I use the cv2.kmeans,but It can't work the error:Required argument 'flags' (pos 6) not found code: # -*- coding: utf-8 -*- """ Spyder…
user3960019
  • 249
  • 1
  • 3
  • 9
1
vote
2 answers

Kmeans clustering on different distance function in Lab space

Problem: To cluster the similar colour pixels in CIE LAB using K means. I want to use CIE 94 for distance between 2 pixels Formula of CIE94 What i read was Kmeans work in "Euclidean space" where the positional cordinates are minimised by cost…
1 2 3
99
100