Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
1 answer

Visualization of multi-dimensional data clusters in R

For a set of documents, I have a feature matrix of size 30 X 32 where rows represent documents and columns = features. So basically 30 documents and 32 features for each of them. After running a PSO Algorithm, I have been able to find some cluster…
QPTR
  • 1,620
  • 7
  • 26
  • 47
2
votes
1 answer

Clustering in Gephi 0.8.2

I'm working with a dataset in Gephi that is derived from a friends table from a Buddypress site. I've done a number of things to the graph which are useful using the built in functionality, but would be interested in a better clustering algorithm…
apellico
  • 21
  • 1
  • 1
  • 2
2
votes
0 answers

sklearn.mixture.DPGMM: only one cluster?

I have a dataset for which I keep getting odd results with the Dirichlet process Gaussian mixture model in sklearn. import sklearn.mixture, pandas import numpy as np from matplotlib import pyplot as plt A = np.random.normal(0, .5,200) B = …
2
votes
1 answer

Assign class to data frame after clustering

I used k-means cluster algorithm on a data-frame df1 and the result is shown in the picture below. library(ade4) df1 <- data.frame(x=runif(100), y=runif(100)) plot(df1) km <- kmeans(df1,…
Michał
  • 273
  • 1
  • 3
  • 13
2
votes
1 answer

How to get the point coordinates and cluster labels from R clusplot()

I use the k-medoids algorithm pam to do clustering based on the (symmetric) distance matrix, tmp, below: if(!require("cluster")) { install.packages("cluster"); require("cluster") } tmp <- matrix(tmp <- matrix(c( 0, 20, 20, 20, 40, 60, 60, …
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
2
votes
2 answers

How to cluster large datasets

I have a very large dataset (500 Million) of documents and want to cluster all documents according to their content. What would be the best way to approach this? I tried using k-means but it does not seem suitable because it needs all documents at…
fwind
  • 1,274
  • 4
  • 15
  • 32
2
votes
1 answer

calculating similarity between two profiles for number of common features

I am working on a clustering problem of social network profiles and each profile document is represented by number of times the 'term of interest occurs' in the profile description. To do clustering effectively, I am trying to find the correct…
2
votes
2 answers

computing z-scores for 2D matrices in scipy/numpy in Python

How can I compute the z-score for matrices in Python? Suppose I have the array: a = array([[ 1, 2, 3], [ 30, 35, 36], [2000, 6000, 8000]]) and I want to compute the z-score for each row. The solution I came up…
user248237
2
votes
3 answers

Which clustering method is suitable for which kind of data?

I would like to know K-means is best suited for clustering of which type of data? When k-means fails? for which type of data set k-means does not give accurate answer? COBWEB is best suited for clustering of which type of data? When COBWEB…
Arpana
  • 21
  • 1
2
votes
2 answers

After clustering in R (iGraph, etc), can you maintain nodes+edges from a cluster to do individual cluster analysis?

Basically I have tried a few different ways of clustering. I can usually get to a point in iGraph where each node is labeled with a cluster. I can then identify all the nodes within a single cluster. However, this loses their edges. I'd have to…
2
votes
0 answers

How to extract points from clusplot graph?

I’m trying to extract the points (IDs) that occur in both ellipses from the graph produced by the function clusplot below. library(cluster) # Creates a sample data set. y <- matrix(runif(5000,max=1,min=0), 1000, 5,…
Samuel Shamiri
  • 137
  • 2
  • 9
2
votes
2 answers

Determining if a set of coordinates are within the same area

When I say coordinates I mean latitude and longitude coordinates of earth. I want to determine if a set of coordinates are within the same area (my cutoff is 200 miles). I've been googling "cluster alorithm" but I'm uncertain which would work best…
Nick Dat Le
  • 369
  • 4
  • 12
2
votes
1 answer

Data Mining and Unbalanced Classes

I have unbalanced classes of records and the data is like the following: X Y Z Class 1 4 Good A 3 5 Very Good A 7 6 Good A 8 7 Excellent A 4 8 Pass A 3 7 Good …
2
votes
0 answers

Constrained k-medoids clustering in R

I am looking for a way to implement semi-supervised clustering, possibly constrained clustering in R, particularly the "cannot-link" part (I think - but see below). I found this question, but I don't know these languages. I have a certain data set…
user3554004
  • 1,044
  • 9
  • 24
2
votes
0 answers

Plotting clusters using k-means with distance from centroid

I am trying to create a plot similar to this: Here there are three clusters and all the datapoints (circles) are plotted according to their euclidean distance from the centroid. Using this image its easy to see that 5 samples from class 2 ended up…
Anthony
  • 33,838
  • 42
  • 169
  • 278