Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
2 answers

How to plot clusters of kmeans in R and show centroids?

I have a dataset that has 6497 instance, 12 attributes, and a class variable called q (quality). The class values can range from 3 to 9. The data can be downloaded in CSV format from here I am doing k-means cluster on this dataset and would like to…
birdy
  • 9,286
  • 24
  • 107
  • 171
2
votes
1 answer

Cluster assignment remapping

I have test classification datasets from UCI Machine Learning repository which are labelled. I am stripping of the labels and using the data to benchmark a few clustering algorithm and then I am planning to use external validation methods. I will…
phoxis
  • 60,131
  • 14
  • 81
  • 117
2
votes
2 answers

get consensus of multiple partitioning methods in R

My data: data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2)) colnames(data)=paste("item",1:3) rownames(data)=paste("method",1:6) I want as an output that according to majority vote, there are two communities (with their elements). Something…
Antoine
  • 1,649
  • 4
  • 23
  • 50
2
votes
2 answers

R- Consecutive K-means clustering operations in R

Let's assume that we have a 10x5 dataset containing 5 chemical measurements(e.g., var1, var2, var3, var4, var5) on 10 wine samples(rows). We'd like to cluster wine samples based on chemical measurements using k means clustering. It's quite easy to…
2
votes
1 answer

Clustering in Torch

I am trying to learn the Torch library for machine learning. I know that the focus of Torch is neural networks, but just for the sake of it I was trying to run kmeans on it. If nothing, Torch implements fast contiguous storage which should be…
Andrea
  • 20,253
  • 23
  • 114
  • 183
2
votes
1 answer

Group variables by clusters on heatmap in R

I am trying to reproduce the first figure of this paper on graph clustering: Here is a sample of my adjacency matrix: …
Antoine
  • 1,649
  • 4
  • 23
  • 50
2
votes
2 answers

k means clustering result storing for later use

I am exploring r programming environment for performing clustering analysis on my test data. For testing I am using a single column data set with the following scatter plot and histogram plotted against the value index. From the data I feel the…
Soumajit
  • 342
  • 2
  • 4
  • 16
2
votes
0 answers

CCC (Cubic Clustering Criterion) doesn't match in R and SAS

I calculated the CCC metric in R (package NbClust) and in SAS (https://support.sas.com/documentation/onlinedoc/v82/techreport_a108.pdf). All the Pseudo-F and R-square are matching exactly with the SAS output except for the E_R2 and hence CCC. I have…
Chaks
  • 21
  • 4
2
votes
1 answer

Different clustering algorithms to cluster timeseries events

I have a very large input file with the following format: ID \t time \t duration \t Description \t status The status column is limited to contain either lower case a,s,i or upper case A,S,I or a mixed of the two (sample element in status col: a,si,…
LKT
  • 311
  • 1
  • 7
  • 17
2
votes
1 answer

Algorithm for clustering names

I have people names (first name, last name and surname) in db column. The data is not full, for example some rows have only first name, last name or surname. are in different order (surname, last name) incorrectly spelled I need an algorithm to…
2
votes
1 answer

Heatmap vs image function in R

I just noticed that the plots from using heatmap() function and image() function look different even though I'm using the same data matrix. I have the following code: set.seed(12345) dataMatrix <- matrix(rnorm(400), nrow =40) set.seed(678910) for(i…
user3922546
  • 187
  • 1
  • 6
  • 16
2
votes
1 answer

Principal Component Analysis in a cluster via MPI

I am setting up a set of computers where to run math programs on top of MPI. Do you know whether exist some library doing PCA - Principal Component Analysis using MPI so to use all the resources of the networked pcs? I will have a look at Scalapack,…
user311906
  • 1,575
  • 2
  • 14
  • 17
2
votes
2 answers

How to perform clustering of lat/lon data points

My preferred algorithm is DBSCAN in scikit-learn. I am not sure however if (and how) to incorporate the radius in addition to latitude and longitude that I use already. My second question in how to compute the centers of the new clusters. Any ideas?
user706838
  • 5,132
  • 14
  • 54
  • 78
2
votes
2 answers

RapidMiner and WEKA : Different clustering result

I am new in Data Mining analytic and Machine Learning. I have been trying to compare the use of Predictive analysis and Clustering analysis using RapidMiner and Weka for my college assignment. Just after I study the advantages and disadvantages from…
M.R. Murazza
  • 346
  • 3
  • 12
2
votes
0 answers

Set Minimum Observation Per Cluster in R

I am new to R, I would like to ask if there is a way to set the minimum number of observation per cluster in R. I am currently using k-means. Sometimes my cluster, looks like this: Clusers: 1 2 3 4 762 24 553 4013 But I want the…
jbest
  • 640
  • 1
  • 10
  • 28
1 2 3
99
100