Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

2 answers

C/C++ Machine Learning Libraries for Clustering

What are some C/c++ Machine learning libraries that supports clustering of multi dimensional data? (for example K-Means) So far I have come across SGI MLC++ http://www.sgi.com/tech/mlc/ OpenCV MLL I am tempted to roll-my-own, but I am sure…

c++ c cluster-analysis machine-learning

asked May 02 '09 at 19:18

The Unknown

19,224
29
77
93

votes

3 answers

Server-side clustering for google maps api v3

I am currently developing a kind of google maps overview widget that displays locations as markers on the map. The amount of markers varies from several hundreds up to thousands of markers (10000 up). Right now I am using MarkerClusterer for google…

php google-maps-api-3 server-side google-maps-markers cluster-analysis

asked Sep 23 '11 at 12:28

mayrs

2,299
2
24
35

votes

3 answers

k-means clustering implementation in Javascript?

I'm in need for a Javascript implementation of the k-means clustering algorithm. I only have 1-dimensional data and rarely more than 100 items, so performance is not an issue. PS: I could only find one but it seems extremely unsteady, resulting in…

javascript cluster-analysis k-means

asked Sep 10 '11 at 09:16

stephanos

3,319
7
33
47

votes

2 answers

Mixed variables (categorical and numerical) distance function

I want to fuzzy cluster a set of jobs. Jobs Attributes are: Categorical: position,diploma, skills Numerical : salary , years of experience My question is: how to calculate the distance between different jobs? e.g…

cluster-analysis distance data-mining

asked Aug 07 '11 at 14:27

Mariya

votes

4 answers

Clustering using Latent Dirichlet Allocation algo in gensim

Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it?

python algorithm cluster-analysis latent-semantic-indexing

asked Jun 26 '11 at 21:03

Sharmila

1,637
2
23
30

votes

4 answers

K-means with really large matrix

I have to perform a k-means clustering on a really huge matrix (about 300.000x100.000 values which is more than 100Gb). I want to know if I can use R software to perform this or weka. My computer is a multiprocessor with 8Gb of ram and hundreds Gb…

r cluster-analysis weka k-means mahout

asked Jun 16 '11 at 13:08

Delphine

1,113
5
15
22

votes

2 answers

How to get the centroids in DBSCAN sklearn?

I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans. However, I observed that DBSCAN has something called core points. I am…

python scikit-learn cluster-analysis dbscan

asked Jun 05 '20 at 12:58

EmJ

4,398
9
44
105

votes

3 answers

R - 'princomp' can only be used with more units than variables

I am using R software (R commander) to cluster my data. I have a smaller subset of my data containing 200 rows and about 800 columns. I am getting the following error when trying kmeans cluster and plot on a graph. "'princomp' can only be used with…

r cluster-analysis k-means pca r-commander

asked Apr 16 '11 at 13:54

CoolSteve

votes

2 answers

Should we used k-means++ instead of k-means?

The k-means++ algorithm helps in two following points of the original k-means algorithm: The original k-means algorithm has the worst case running time of super-polynomial in input size, while k-means++ has claimed to be O(log k). The approximation…

algorithm performance comparison cluster-analysis k-means

asked Jan 16 '11 at 16:53

Karl

5,613
13
73
107

votes

5 answers

How to generate Bad Random Numbers

I'm sure the opposite has been asked many times but I couldn't find any answers on how to generate bad random numbers. I want to write a small program for cluster analysis and want to generate some random Points for testing. If I would just insert…

random cluster-analysis prng data-generation

asked Nov 04 '10 at 16:15

Nicolas

1,828
6
23
34

votes

1 answer

Understanding DynamicTreeCut algorithm for cutting a dendrogram

A dendrogram is a data structure used with hierarchical clustering algorithms that groups clusters at different "heights" of a tree - where the heights correspond to distance measures between clusters. After a dendrogram is created from some input…

algorithm cluster-analysis hierarchical-clustering dendrogram unsupervised-learning

asked Sep 03 '16 at 15:48

Siler

8,976
11
64
124

votes

0 answers

Using precision recall metric on a hierarchy of recovered clusters

Context: We are two students intending to write a thesis on reverse engineering namespaces using hierarchical agglomerative clustering algorithms. We have a variation of linking methods and other tweaks to the algorithm we want to try out. We will…

cluster-analysis hierarchical-clustering precision-recall

asked Apr 05 '16 at 10:51

David

votes

5 answers

3D clustering Algorithm

Problem Statement: I have the following problem: There are more than a billion points in 3D space. The goal is to find the top N points which has largest number of neighbors within given distance R. Another condition is that the distance between any…

algorithm 3d cluster-analysis spatial data-partitioning

asked Aug 14 '10 at 05:30

Teng Lin

votes

4 answers

Using Silhouette Clustering in Spark

I want to use silhouette to determine optimal value for k when using KMeans clustering in Spark. Is there any optimal way parallelize this? i.e. make it scalable

machine-learning apache-spark cluster-analysis distributed-computing k-means

asked Aug 06 '15 at 18:24

zunior

votes

1 answer

How to spread out community graph made by using igraph package in R

Trying to find communities in tweet data. The cosine similarity between different words forms the adjacency matrix. Then, I created graph out of that adjacency matrix. Visualization of the graph is the task here: # Document Term Matrix dtm =…

r cluster-analysis igraph graph-visualization

asked Feb 25 '15 at 09:46

magarwal

Prev 1 2 3

…

99 100 Next