Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
10
votes
2 answers

How to compute distances between centroids and data matrix (for kmeans algorithm)

I am a student of clustering and R. In order to obtain a better grip of both I would like to compute the distance between centroids and my xy-matrix for each iteration till it "converges". How can I solve for step 2 and 3 using R? library(fields) x…
Mamba
  • 1,183
  • 2
  • 13
  • 33
10
votes
1 answer

Retrieving the optimal number of clusters in R

I have data for which I want to evaluate the optimal number of clusters according to the Gap statistic. I read the page on gap statistic in r which gives the following example: gs.pam.RU <- clusGap(ruspini, FUN = pam1, K.max = 8, B =…
teaLeef
  • 1,879
  • 2
  • 16
  • 26
10
votes
8 answers

How to manage session variables in a web cluster?

Session variables are normally keept in the web server RAM memory. In a cluster, each request made by a client can be handled by a different cluster node. right?! So, in this case... What happens with session variables? Aren't they stored in the…
Daniel Silveira
  • 41,125
  • 36
  • 100
  • 121
10
votes
6 answers

What are some packages that implement semi-supervised (constrained) clustering?

I want to run some experiments on semi-supervised (constrained) clustering, in particular with background knowledge provided as instance level pairwise constraints (Must-Link or Cannot-Link constraints). I would like to know if there are any good…
user1271286
  • 333
  • 5
  • 14
10
votes
1 answer

"NAs introduced by coercion" during Cluster Analysis in R

Guys I'm new to this language ,I'm running cluster analysis on a data frame but when I calculate the distance I get this warning "NAs introduced by coercion". What does this mean? d <- dist(as.matrix(mydata1)) Warning message: In…
Ravee
  • 145
  • 1
  • 2
  • 7
10
votes
7 answers

Clustering given pairwise distances with unknown cluster number?

I have a set of objects {obj1, obj2, obj3, ..., objn}. I have calculated the pairwise distances of all possible pairs. The distances are stored in a n*n matrix M, with Mij being the distance between obji and objj. Then it is natural to see M is a…
Sibbs Gambling
  • 19,274
  • 42
  • 103
  • 174
10
votes
2 answers

partitioning an float array into similar segments (clustering)

I have an array of floats like this: [1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200] Now, I want to partition the array like this: [[1.91, 2.87, 3.61] , [10.91, 11.91, 12.82] , [100.73, 100.71, 101.89] , [200]] // [200] will…
alessandro
  • 1,681
  • 10
  • 33
  • 54
10
votes
1 answer

MATLAB: Self-Organizing Map (SOM) clustering

I'm trying to cluster some images depending on the angles between body parts. The features extracted from each image are: angle1 : torso - torso angle2 : torso - upper left arm .. angle10: torso - lower right foot Therefore the input data is a…
tguclu
  • 689
  • 3
  • 10
  • 25
10
votes
4 answers

In scikit-learn, can DBSCAN use sparse matrix?

I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method…
10
votes
2 answers

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?
afs
  • 167
  • 1
  • 9
10
votes
7 answers

Generating 'neighbours' for users based on rating

I'm looking for techniques to generate 'neighbours' (people with similar taste) for users on a site I am working on; something similar to the way last.fm works. Currently, I have a compatibilty function for users which could come into play. It ranks…
Austin Platt
  • 119
  • 5
10
votes
2 answers

How to group nearby latitude and longitude locations stored in SQL

Im trying to analyse data from cycle accidents in the UK to find statistical black spots. Here is the example of the data from another website. http://www.cycleinjury.co.uk/map I am currently using SQLite to ~100k store lat / lon locations. I want…
Robert
  • 37,670
  • 37
  • 171
  • 213
10
votes
1 answer

How to get clusters to line up on the diagonal using heatmap.2 in r?

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in…
Alos
  • 2,657
  • 5
  • 35
  • 47
10
votes
3 answers

clustering with cosine similarity

I have a large data set that I would like to cluster. My trial run set size is 2,500 objects; when I run it on the 'real deal' I will need to handle at least 20k objects. These objects have a cosine similarity between them. This cosine similarity…
10
votes
5 answers

large scale clustering library possibly with python bindings

I've been trying to cluster some larger dataset. consisting of 50000 measurement vectors with dimension 7. I'm trying to generate about 30 to 300 clusters for further processing. I've been trying the following clustering implementations with no…
tisch
  • 1,098
  • 3
  • 13
  • 30