Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

3 answers

Efficient k-means evaluation with silhouette score in sklearn

I am running k-means clustering on ~1 million items (each represented as a ~100-feature vector). I have run the clustering for various k, and now want to evaluate the different results with the silhouette score implemented in sklearn. Attempting to…

python scikit-learn cluster-analysis

asked May 15 '14 at 19:41

moustachio

2,924
3
36
68

votes

4 answers

Trajectory Clustering: Which Clustering Method?

As a newbie in Machine Learning, I have a set of trajectories that may be of different lengths. I wish to cluster them, because some of them are actually the same path and they just SEEM different due to the noise. In addition, not all of them are…

algorithm machine-learning cluster-analysis data-mining

asked Sep 16 '13 at 05:18

Sibbs Gambling

19,274
42
103
174

votes

8 answers

Efficient way of calculating likeness scores of strings when sample size is large?

Let's say that you have a list of 10,000 email addresses, and you'd like to find what some of the closest "neighbors" in this list are - defined as email addresses that are suspiciously close to other email addresses in your list. I'm aware of how…

algorithm string cluster-analysis complexity-theory edit-distance

asked Oct 22 '09 at 20:24

matt b

138,234
66
282
345

votes

2 answers

R: How to overlay pie charts on 'dots' in a scatterplot in R

Using R I would like to replace the points in a 2d scatter plot by a pie chart displaying additional values. The rational behind is that I have time series data for hundreds of elements (proteins) derived from a biological experiment monitored for 4…

r charts ggplot2 visualization cluster-analysis

asked Feb 10 '12 at 18:46

philipp

votes

1 answer

How to specify distance metric while for kmeans in R?

I'm doing kmeans clustering in R with two requirements: I need to specify my own distance function, now it's Pearson Coefficient. I want to do the clustering that uses average of group members as centroids, rather some actual member. The reason for…

r cluster-analysis k-means

asked Sep 23 '11 at 03:51

Derrick Zhang

21,201
18
53
73

votes

5 answers

Java machine learning library for commercial use?

Does anyone know a good Java machine learning library I can use for a commercial product? Weka and Rapidminer unfortunately do not allow this. I already found Apache Mahout and Java Data Mininng Package. Has anyone experience with them and provide…

java machine-learning cluster-analysis classification

asked Jul 26 '11 at 11:32

WorstCase

votes

5 answers

Graph Theory: Calculating Clustering Coefficient

I'm doing some research and I've come to a point where I have calculate the clustering coefficient of a graph. According to this paper directly related to my research: The clustering coefﬁcient C(p) is deﬁned as follows. Suppose that a vertex v…

algorithm cluster-analysis graph-theory

asked Jul 10 '11 at 20:45

Griffin

13,184
4
29
43

votes

6 answers

How do I create a radial cluster like the following code-example in Python?

I've found several examples on how to create these exact hierarchies (at least I believe they are) like the following here stackoverflow.com/questions/2982929/ which work great, and almost perform what I'm looking for. [EDIT]Here's a simplified…

python numpy scipy cluster-analysis dendrogram

asked Feb 23 '11 at 09:28

T Carrasco

votes

2 answers

Interest and location based algorithm for android mobile app

I am trying to work on android mobile app where I have a functionality to find matches according to interest and location. Many dating apps are already doing some kinda functionality for example Tinder matches based on locations, gender and age…

android algorithm firebase match cluster-analysis

asked Apr 23 '17 at 16:23

N Sharma

33,489
95
256
444

votes

2 answers

Image clustering by its similarity in python

I have a collection of photos and I'd like to distinguish clusters of the similar photos. Which features of an image and which algorithm should I use to solve my task?

python machine-learning computer-vision cluster-analysis

asked Aug 24 '16 at 12:31

alex

votes

1 answer

How to Bound the Outer Area of Voronoi Polygons and Intersect with Map Data

Background I'm trying to visualize the results of a kmeans clustering procedure on the following data using voronoi polygons on a US map. Here is the code I've been running so far: input <- read.csv("LatLong.csv", header = T, sep = ",") # K Means…

r ggplot2 cluster-analysis data-visualization voronoi

asked Mar 25 '16 at 14:26

Rick Arko

votes

4 answers

Python Clustering 'purity' metric

I'm using a Gaussian Mixture Model (GMM) from sklearn.mixture to perform clustering of my data set. I could use the function score() to compute the log probability under the model. However, I am looking for a metric called 'purity' which is defined…

python scikit-learn cluster-analysis

asked Dec 02 '15 at 16:14

Kuka

votes

3 answers

What is a convenient way to do document clustering with elasticsearch?

I have stored a lot of news articles from RSS feeds from different sources in an elasticsearch index. At the moment when I do a search query, it will return me a lot of similar news articles for one query, because the same news topics gets covered…

algorithm elasticsearch cluster-analysis

asked Feb 06 '15 at 17:44

asmaier

11,132
11
76
103

votes

1 answer

Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances

I would like to apply some basic clustering techniques to some latitude and longitude coordinates. Something along the lines of clustering (or some unsupervised learning) the coordinates into groups determined either by their great circle distance…

r cluster-analysis

asked Jan 13 '14 at 15:23

JasonAizkalns

20,243
8
57
116

votes

3 answers

An understandable clusterization

I have a dataset. Each element of this set consists of numerical and categorical variables. Categorical variables are nominal and ordinal. There is some natural structure in this dataset. Commonly, experts clusterize datasets such as mine using…

algorithm machine-learning computer-science data-mining cluster-analysis

asked Aug 28 '12 at 08:01

Artem Pianykh

1,161
1
10
23

Prev 1 2 3

…

99 100 Next