Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
13
votes
1 answer

Assign new data point to cluster in kernel k-means (kernlab package in R)?

I have a question about the kkmeans function in the kernlab package of R. I am new to this package and please forgive me if I'm missing something obvious here. I would like to assign a new data point to a cluster in a set of clusters that were…
carl5978
  • 131
  • 1
  • 3
13
votes
3 answers

clustering with NA values in R

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values. So my questions are: How clara handles NAs? Can this be somehow used for kmeans (Nas not…
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39
12
votes
2 answers

How to create a cluster plot in R?

How can I create a cluster plot in R without using clustplot? I am trying to get to grips with some clustering (using R) and visualisation (using HTML5 Canvas). Basically, I want to create a cluster plot but instead of plotting the data, I want to…
slotishtype
  • 2,715
  • 7
  • 32
  • 47
12
votes
4 answers

Is a Fuzzy C-Means algorithm available for Python?

I have some dots in a 3 dimensional space and would like to cluster them. I know Pythons module "cluster", but it has only K-Means. Do you know a module which has FCM (Fuzzy C-Means)? (If you know some other python modules which are related to…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
12
votes
5 answers

How to get the K most distant points, given their coordinates?

We have boring CSV with 10000 rows of ages (float), titles (enum/int), scores (float), .... We have N columns each with int/float values in a table. You can imagine this as points in ND space We want to pick K points that would have maximised…
DuckQueen
  • 772
  • 10
  • 62
  • 134
12
votes
3 answers

Global Dynamic Supervisor in a cluster

I have a unique issue that I have not had a need to address in elxir. I need to use the dynamic supervisor to start (n) amount of children dynamicly in a clustered environment. I am using libcluster to manage the clustering and use the global…
Botonomous
  • 1,746
  • 1
  • 16
  • 39
12
votes
4 answers

Clustering ~100,000 Short Strings in Python

I want to cluster ~100,000 short strings by something like q-gram distance or simple "bag distance" or maybe Levenshtein distance in Python. I was planning to fill out a distance matrix (100,000 choose 2 comparisons) and then do hierarchical…
135498
  • 251
  • 1
  • 4
  • 6
12
votes
1 answer

How to generate performance stats of clustering from flexclust?

After trying a few clustering algorithms, I got the best performance on my dataset using flexclust::kcca with family = kccaFamily("angle"). Here's an example using the Nclus dataset from flexclust. library(fpc) library(flexclust) data(Nclus) k <-…
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
12
votes
4 answers

DBSCAN on spark : which implementation

I would like to do some DBSCAN on Spark. I have currently found 2 implementations: https://github.com/irvingc/dbscan-on-spark https://github.com/alitouka/spark_dbscan I have tested the first one with the sbt configuration given in its github but:…
Benjamin
  • 3,350
  • 4
  • 24
  • 49
12
votes
2 answers

Which programming structure for clustering algorithm

I am trying to implement the following (divisive) clustering algorithm (below is presented short form of the algorithm, the full description is available here): Start with a sample x, i = 1, ..., n regarded as a single cluster of n data points and a…
Andrej
  • 3,719
  • 11
  • 44
  • 73
12
votes
1 answer

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson…
user248237
12
votes
2 answers

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

I need to cluster some data and I tried kmeans, pam, and clara with R. The problem is that my data are in a column of a data frame, and contains NAs. I used na.omit() to get my clusters. But then how can I associate them with the original data? The…
Bakaburg
  • 3,165
  • 4
  • 32
  • 64
12
votes
3 answers

clustering very large dataset in R

I have a dataset consisting of 70,000 numeric values representing distances ranging from 0 till 50, and I want to cluster these numbers; however, if I'm trying the classical clustering approach, then I would have to establish a 70,000X70,000…
DOSMarter
  • 1,485
  • 5
  • 21
  • 29
12
votes
3 answers

mahout lucene document clustering howto?

I'm reading that i can create mahout vectors from a lucene index that can be used to apply the mahout clustering algorithms. http://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text I would like to apply K-means clustering…
maiky
  • 3,503
  • 7
  • 28
  • 28
12
votes
3 answers

Identify clusters in SOM (Self Organizing Map)

Once I have collected and organized data in a SOM how do I identify clusters? (Items are aggregated and clustered using many traits - upwards of 10) Specifically I want to find the 'center' of the cluster - therefor giving me the 'center' node(s).
Tyler Wall
  • 3,747
  • 7
  • 37
  • 52