Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

3 answers

News clustering

How does Google News and Techmeme cluster news items that are similar? Are there any well know algorithm that is used to achieve this? Appreciate your help. Thanks in advance.

algorithm cluster-analysis

asked Apr 24 '09 at 05:10

niraj

votes

3 answers

Where to find a reliable K-medoid(Not k-means) open source software/tool?

I am learning the K-medoids algorithm so I am sorry if I ask inappropriate questions. As I know,the K-medoids algorithm implements a K-means clustering but use actual data points to be centroid instead of mathematical calculated means. As I googled…

open-source cluster-analysis k-means

asked Oct 05 '11 at 20:03

Cassie

1,179
6
18
30

votes

2 answers

Weka simple K-means clustering assignments

I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry. I am using…

cluster-analysis data-mining weka k-means

asked Jul 13 '11 at 21:32

machine yearning

9,889
5
38
51

votes

1 answer

How to find the success rate of a clustering algorithm?

I have implemented several clustering algorithms on an image dataset. I'm interested in deriving the success rate of clustering. I have to detect the tumor area, in the original image I know where the tumor is located, I would like to compare the…

python image-processing cluster-analysis analysis

asked Jul 25 '18 at 17:56

GuroTozzi

votes

3 answers

Clustering with a distance matrix

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example, A B C D E F G H I J K L A 0 20 20 20 40 60 60 60 100 120 120 120 B 20 0 20 20 60 80 80 80 120 140 140…

algorithm matrix cluster-analysis distance

asked Sep 16 '10 at 09:01

yassin

6,529
7
34
39

votes

3 answers

Newman's modularity clustering for graphs

I am interested in running Newman's modularity clustering algorithm on a large graph. If you can point me to a library (or R package, etc) that implements it I would be most grateful. best ~lara

r statistics graph cluster-analysis modularity

asked Aug 19 '10 at 15:54

laramichaels

1,515
5
18
30

votes

1 answer

Variation on "How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?"

This is a question related to https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o For completeness, here's the original example from that…

r visualization cluster-analysis nearest-neighbor

asked Jul 05 '15 at 20:20

Rafael Santos

votes

2 answers

How to print result of clustering in sklearn

I have a sparse matrix from scipy.sparse import * M = csr_matrix((data_np, (rows_np, columns_np))); then I'm doing clustering that way from sklearn.cluster import KMeans km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1,…

python scikit-learn cluster-analysis k-means

asked Apr 22 '15 at 13:26

thepolina

1,244
1
14
28

votes

3 answers

How can GridSearchCV be used for clustering (MeanShift or DBSCAN)?

I'm trying to cluster some text documents using scikit-learn. I'm trying out both DBSCAN and MeanShift and want to determine which hyperparameters (e.g. bandwidth for MeanShift and eps for DBSCAN) best work for the kind of data I'm using (news…

flutter scikit-learn cluster-analysis

asked Sep 02 '14 at 22:27

frnsys

2,404
3
21
25

votes

3 answers

Extract labels membership / classification from a cut dendrogram in R (i.e.: a cutree function for dendrogram)

I'm trying to extract a classification from a dendrogram in R that I've cut at a certain height. This is easy to do with cutree on an hclustobject, but I can't figure out how to do it on a dendrogram object. Further, I can't just use my clusters…

r classification cluster-analysis dendrogram dendextend

asked Aug 22 '14 at 17:25

Oreotrephes

votes

3 answers

Cosine distance as vector distance function for k-means

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not…

cluster-analysis data-mining distance k-means cosine-similarity

asked Aug 07 '14 at 11:15

Thalis K.

7,363
6
39
54

votes

2 answers

Algorithm to decide cut-off for collapsing this tree?

I have a Newick tree that is built by comparing similarity (euclidean distance) of Position Weight Matrices (PWMs or PSSMs) of putative DNA regulatory motifs that are 4-9 bp long DNA sequences. An interactive version of the tree is up on iTol…

python statistics cluster-analysis bioinformatics

asked Apr 28 '14 at 16:54

hello_there_andy

2,039
2
21
51

votes

2 answers

What method do you use for selecting the optimum number of clusters in k-means and EM?

Many algorithms for clustering are available. A popular algorithm is the K-means where, based on a given number of clusters, the algorithm iterates to find best clusters for the objects. What method do you use to determine the number of clusters in…

r cluster-analysis data-mining expectation-maximization

asked Feb 22 '10 at 17:53

gd047

29,749
18
107
146

votes

4 answers

Python Clustering Algorithms

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori…

cluster-analysis k-means dbscan

asked Nov 13 '13 at 14:59

astromax

6,001
10
36
47

votes

4 answers

Hierarchical Clustering: Determine optimal number of cluster and statistically describe Clusters

I could use some advice on methods in R to determine the optimal number of clusters and later on describe the clusters with different statistical criteria. I’m new to R with basic knowledge about the statistical foundations of cluster analysis.…

r data-mining cluster-analysis

asked Nov 06 '12 at 10:51

Joschi

2,941
9
28
36

Prev 1 2 3

…

99 100 Next