Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

1 answer

Visualization of multi-dimensional data clusters in R

For a set of documents, I have a feature matrix of size 30 X 32 where rows represent documents and columns = features. So basically 30 documents and 32 features for each of them. After running a PSO Algorithm, I have been able to find some cluster…

r plot cluster-analysis

asked May 25 '15 at 15:12

QPTR

1,620
7
26
47

votes

1 answer

Clustering in Gephi 0.8.2

I'm working with a dataset in Gephi that is derived from a friends table from a Buddypress site. I've done a number of things to the graph which are useful using the built in functionality, but would be interested in a better clustering algorithm…

cluster-analysis social-networking gephi sna

asked May 20 '15 at 19:35

apellico

votes

0 answers

sklearn.mixture.DPGMM: only one cluster?

I have a dataset for which I keep getting odd results with the Dirichlet process Gaussian mixture model in sklearn. import sklearn.mixture, pandas import numpy as np from matplotlib import pyplot as plt A = np.random.normal(0, .5,200) B = …

python machine-learning scikit-learn cluster-analysis

asked May 19 '15 at 19:29

imadrin

votes

1 answer

Assign class to data frame after clustering

I used k-means cluster algorithm on a data-frame df1 and the result is shown in the picture below. library(ade4) df1 <- data.frame(x=runif(100), y=runif(100)) plot(df1) km <- kmeans(df1,…

r cluster-analysis data-mining k-means

asked May 15 '15 at 13:05

Michał

votes

1 answer

How to get the point coordinates and cluster labels from R clusplot()

I use the k-medoids algorithm pam to do clustering based on the (symmetric) distance matrix, tmp, below: if(!require("cluster")) { install.packages("cluster"); require("cluster") } tmp <- matrix(tmp <- matrix(c( 0, 20, 20, 20, 40, 60, 60, …

r ggplot2 cluster-analysis

asked May 12 '15 at 14:01

Zhubarb

11,432
18
75
114

votes

2 answers

How to cluster large datasets

I have a very large dataset (500 Million) of documents and want to cluster all documents according to their content. What would be the best way to approach this? I tried using k-means but it does not seem suitable because it needs all documents at…

algorithm data-structures cluster-analysis

asked May 12 '15 at 10:01

fwind

1,274
4
15
32

votes

1 answer

calculating similarity between two profiles for number of common features

I am working on a clustering problem of social network profiles and each profile document is represented by number of times the 'term of interest occurs' in the profile description. To do clustering effectively, I am trying to find the correct…

machine-learning cluster-analysis similarity unsupervised-learning

asked May 04 '15 at 07:19

Yantraguru

3,604
3
18
21

votes

2 answers

computing z-scores for 2D matrices in scipy/numpy in Python

How can I compute the z-score for matrices in Python? Suppose I have the array: a = array([[ 1, 2, 3], [ 30, 35, 36], [2000, 6000, 8000]]) and I want to compute the z-score for each row. The solution I came up…

python numpy cluster-analysis machine-learning scipy

asked Jun 06 '10 at 17:29

user248237

votes

3 answers

Which clustering method is suitable for which kind of data?

I would like to know K-means is best suited for clustering of which type of data? When k-means fails? for which type of data set k-means does not give accurate answer? COBWEB is best suited for clustering of which type of data? When COBWEB…

algorithm cluster-analysis

asked Jun 04 '10 at 10:49

Arpana

votes

2 answers

After clustering in R (iGraph, etc), can you maintain nodes+edges from a cluster to do individual cluster analysis?

Basically I have tried a few different ways of clustering. I can usually get to a point in iGraph where each node is labeled with a cluster. I can then identify all the nodes within a single cluster. However, this loses their edges. I'd have to…

r cluster-analysis igraph

asked Apr 19 '15 at 17:00

SuirouNoJutsu

votes

0 answers

How to extract points from clusplot graph?

I’m trying to extract the points (IDs) that occur in both ellipses from the graph produced by the function clusplot below. library(cluster) # Creates a sample data set. y <- matrix(runif(5000,max=1,min=0), 1000, 5,…

r cluster-analysis k-means

asked Apr 17 '15 at 14:23

Samuel Shamiri

votes

2 answers

Determining if a set of coordinates are within the same area

When I say coordinates I mean latitude and longitude coordinates of earth. I want to determine if a set of coordinates are within the same area (my cutoff is 200 miles). I've been googling "cluster alorithm" but I'm uncertain which would work best…

c# algorithm coordinates geospatial cluster-analysis

asked Jun 02 '10 at 17:15

Nick Dat Le

votes

1 answer

Data Mining and Unbalanced Classes

I have unbalanced classes of records and the data is like the following: X Y Z Class 1 4 Good A 3 5 Very Good A 7 6 Good A 8 7 Excellent A 4 8 Pass A 3 7 Good …

statistics classification cluster-analysis data-mining decision-tree

asked Apr 09 '15 at 19:39

Ahmed Alashrafy

votes

0 answers

Constrained k-medoids clustering in R

I am looking for a way to implement semi-supervised clustering, possibly constrained clustering in R, particularly the "cannot-link" part (I think - but see below). I found this question, but I don't know these languages. I have a certain data set…

r graph constraints cluster-analysis wordnet

asked Apr 06 '15 at 16:27

user3554004

1,044
9
24

votes

0 answers

Plotting clusters using k-means with distance from centroid

I am trying to create a plot similar to this: Here there are three clusters and all the datapoints (circles) are plotted according to their euclidean distance from the centroid. Using this image its easy to see that 5 samples from class 2 ended up…

r cluster-analysis k-means

asked Apr 01 '15 at 18:20

Anthony

33,838
42
169
278

Prev 1 2 3

…

99 100 Next