Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

1 answer

Dendrogram or Other Plot from Distance Matrix

I have three matrices to compare. Each of them is 5x6. I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are grouped, given a threshold of similarity. I could not find any such functions…

python matrix scipy cluster-analysis dendrogram

asked Jan 01 '17 at 15:19

amc

votes

3 answers

Sklearn : Mean Distance from Centroid of each cluster

How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the centroid of each cluster. Now i want to find the mean distance from centroid…

python numpy scikit-learn cluster-analysis k-means

asked Nov 27 '16 at 12:26

Rezwan

1,203
1
7
22

votes

1 answer

How to do clustering using the matrix of correlation coefficients?

I have a correlation coefficient matrix (n*n). How to do clustering using the correlation coefficient matrix? Can I use linkage and fcluster function in SciPy? Linkage function needs n * m matrix (according to tutorial), but I want to use n*n…

python scipy cluster-analysis correlation linkage

asked Jun 28 '16 at 08:04

Siny

votes

1 answer

Plotting the boundaries of cluster zone in Python with scikit package

Here is my simple example of dealing with data clustering in 3 attribute(x,y,value). each sample represent its location(x,y) and its belonging variable. My code was post here: x = np.arange(100,200,1) y = np.arange(100,200,1) value =…

python matplotlib scikit-learn cluster-analysis k-means

asked Jun 08 '16 at 15:58

Han Zhengzu

3,694
7
44
94

votes

1 answer

Sklearn AffinityPropagation MemoryError

I think I already know my answer but there's a lot smarter and experienced people out there than me so I wanted to ask. I'm running into MemoryError when trying to fit my hash_matrix () to AffinityPropagation. …

python memory machine-learning scikit-learn cluster-analysis

asked Mar 09 '16 at 20:17

Jarad

17,409
19
95
154

votes

2 answers

Estimate the minimum Distance between two Clusters

I am designing an agglomerative, bottom-up clustering algorithm for millions of 50-1000 dimensional points. In two parts of my algorithm, I need to compare two clusters of points and decide the separation between the two clusters. The exact distance…

algorithm cluster-analysis distance approximation

asked Jan 06 '16 at 17:20

Paul Chernoch

5,275
3
52
73

votes

2 answers

Efficient algorithm to group points in clusters by distance between every two points

I am looking for an efficient algorithm for the following problem: Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two…

algorithm machine-learning cluster-analysis data-mining

asked Sep 06 '15 at 21:34

ovk

2,318
1
23
30

votes

1 answer

User profiling with Mahout from categorized user behavior

I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big…

classification cluster-analysis mahout

asked Jun 29 '15 at 23:11

Turcia

votes

3 answers

Clustering a large, very sparse, binary matrix in R

I have a large, sparse binary matrix (roughly 39,000 x 14,000; most rows have only a single "1" entry). I'd like to cluster similar rows together, but my initial plan takes too long to complete: d <- dist(inputMatrix, method="binary") hc <-…

r performance matrix cluster-analysis sparse-matrix

asked Jun 19 '15 at 18:11

Matt LaFave

votes

4 answers

k-means clustering in R on very large, sparse matrix?

I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse…

r cluster-analysis sparse-matrix

asked Jun 14 '10 at 18:03

movingabout

votes

3 answers

Algorithm for clustering with minimum size constraints

I have a set of data clustering into k groups, each cluster has a minimum size constraint of m I've done some reclustering of the data. So now I got this set of points that each one has one or more better clusters to be in, but cannot be switched…

algorithm cluster-analysis

asked May 07 '15 at 22:01

qshng

votes

1 answer

K means clustering for multidimensional data

if the data set has 440 objects and 8 attributes (dataset been taken from UCI machine learning repository). Then how do we calculate centroids for such datasets. (wholesale customers…

machine-learning cluster-analysis

asked Sep 03 '14 at 17:24

Suvidha

votes

4 answers

Python KMeans clustering words

I am interested to perform kmeans clustering on a list of words with the distance measure being Leveshtein. 1) I know there are a lot of frameworks out there, including scipy and orange that has a kmeans implementation. However they all require…

python cluster-analysis

asked Mar 17 '10 at 03:29

sadawd

votes

3 answers

Clustering algorithm in R for missing categorical and numerical values

I want to perform marketing segmentation clustering on a dataset with missing categorical and numerical values in R. I cannot perform k-means clustering because of the missing values. R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0…

r machine-learning cluster-analysis missing-data

asked Jun 03 '14 at 23:26

Scott Davis

votes

1 answer

How to add ColSideColors on heatmap.2 after performing bi-clustering (row and column)

I have the following code: library(gplots) library(RColorBrewer); setwd("~/Desktop") mydata <- mtcars hclustfunc <- function(x) hclust(x, method="complete") distfunc <- function(x) dist(x,method="euclidean") d <- distfunc(mydata) fit <-…

r plot cluster-analysis heatmap

asked Mar 09 '14 at 04:46

pdubois

7,640
21
70
99

Prev 1 2 3

…

99 100 Next