Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
2 answers

How to cluster list-of-list by distance condition in Python

I have the following list of lists that contains 6 entries: lol = [['a', 3, 1.01], ['x', 5, 1.00], ['k', 7, 2.02], ['p', 8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]] Each sublist in lol contains 3 elements: ['a',…
pdubois
  • 7,640
  • 21
  • 70
  • 99
2
votes
2 answers

Clustering a long list of words

I have the following problem at hand: I have a very long list of words, possibly names, surnames, etc. I need to cluster this word list, such that similar words, for example words with similar edit (Levenshtein) distance appears in the same cluster.…
2
votes
4 answers

Translation of clustering problem to graph theory language

I have a rectangular planar grid, with each cell assigned some integer weight. I am looking for an algorithm to identify clusters of 3 to 6 adjacent cells with higher-than-average weight. These blobs should have approximately circular shape. For my…
Benjamin Bannier
  • 55,163
  • 11
  • 60
  • 80
2
votes
1 answer

R: clustering documents

I've got a documentTermMatrix that looks as follows: artikel naam product personeel loon verlof doc 1 1 1 2 1 0 0 doc 2 1 1 1 0 0 0 doc 3 0 0 1 1 …
Anita
  • 759
  • 1
  • 10
  • 23
2
votes
2 answers

MATLAB: draw centroids

My main question is given a feature centroid, how can I draw it in MATLAB? In more detail, I have an NxNx3 image (an RGB image) of which I take 4x4 blocks and compute a 6-dimensional feature vector for each block. I store these feature vectors in an…
Myx
  • 1,792
  • 5
  • 23
  • 37
2
votes
1 answer

Outlier removal before or after Kalman filtering?

I am getting radar data points in form of (x,y) coordinate system relative to my position every ms.[around 10-15 data points]. Now, inorder to have better position estimate of the points, I would like to apply Kalman filter. I also would like to…
2
votes
1 answer

Spectral clustering with sklearn and a big affinity matrix

I am trying to use the spectral clustering method provided by scikit-learn to aggregate the rows of my dataset (which are only 16000). My issue arises after I precompute the affinity matrix (a 16000x16000 float matrix) which allocates 3 gigabyte…
rano
  • 5,616
  • 4
  • 40
  • 66
2
votes
1 answer

How to find clusters of values in numpy array

I have an array (M x N) of air pressure data (gridded model data). There's also two arrays (also M x N) for latitudes and longitudes. To build a GeoJSON of isobars (surfaces of equal pressure) I need to find clusters of pressure values with given…
bolkhovsky
  • 120
  • 1
  • 8
2
votes
1 answer

Best practices for building a simple, scalable cluster on Amazon EC2 for a Java web app

I want to build a Java web app and deploy it on EC2. It will be written in Java and will use MySQL. I was hoping to get some pointers on the actual deployment process and configuration. In particular I'm interested in the following topics: machine…
albogdano
  • 2,710
  • 2
  • 33
  • 43
2
votes
1 answer

Phase based event detection from time-series data

I have a large time series data(1D floating point array) which represents various events. Similar events have similar phases. However, I don't know the number of events occurred during that time. Is it possible to write a program (preferably in…
precision
  • 293
  • 2
  • 15
2
votes
1 answer

Clustering unstructured text based on similarity and calculating optimum number of clusters

I am a data mining beginner and am trying to first formulate an approach to a clustering problem I am solving. Suppose we have x writers, each with a particular style (use of unique words etc.). They each write multiple short texts, let's say a…
2
votes
1 answer

Clustering in Matlab

Hi I am trying to cluster using linkage(). Here is the code I am trying.. Y = pdist(data); Z = linkage(Y); T = cluster(Z,'maxclust',4096); I am getting error as follows The number of elements exceeds the maximum allowed size in MATLAB. Error in…
2
votes
0 answers

Computing Silhouette Width - special case

I am completely redrafting this question following the advice of @MrFlick. Assume I have a data.frame like the following set.seed(1) group<-(rep(1:10, sample(50:200, 10, replace=T))) gender<-factor((sample(0:1, 1328, replace=T, prob=c(0.55,…
Riccardo
  • 743
  • 2
  • 5
  • 14
2
votes
1 answer

Problems with gmdistribution.fit

I'm trying to do clustering with gm. I tried this code: opts = statset('MaxIter', 300, 'Display', 'iter'); gm = gmdistribution.fit(braindata, nsegments, 'Regularize', 1e-6, 'Options', opts); where braindata is a data matrix(voxel*protein,…
Victor
  • 21
  • 2
2
votes
1 answer

Finding defined peaks with Clusters in MATLAB

this is my problem: I have the next data "A", which looks like: As you can see, I have drawn with red circles the apparently peaks, the most defined are 2 and 7, I say that they are defined because its standard deviation is low in comparison with…
lisandrojim
  • 509
  • 5
  • 18
1 2 3
99
100