Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

1 answer

Assign new data point to cluster in kernel k-means (kernlab package in R)?

I have a question about the kkmeans function in the kernlab package of R. I am new to this package and please forgive me if I'm missing something obvious here. I would like to assign a new data point to a cluster in a set of clusters that were…

r machine-learning cluster-analysis k-means kernlab

asked Jul 23 '12 at 22:48

carl5978

votes

3 answers

clustering with NA values in R

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values. So my questions are: How clara handles NAs? Can this be somehow used for kmeans (Nas not…

r cluster-analysis

asked May 23 '12 at 13:46

danas.zuokas

4,551
4
29
39

votes

2 answers

How to create a cluster plot in R?

How can I create a cluster plot in R without using clustplot? I am trying to get to grips with some clustering (using R) and visualisation (using HTML5 Canvas). Basically, I want to create a cluster plot but instead of plotting the data, I want to…

r plot cluster-analysis

asked Jan 26 '12 at 14:31

slotishtype

2,715
7
32
47

votes

4 answers

Is a Fuzzy C-Means algorithm available for Python?

I have some dots in a 3 dimensional space and would like to cluster them. I know Pythons module "cluster", but it has only K-Means. Do you know a module which has FCM (Fuzzy C-Means)? (If you know some other python modules which are related to…

python cluster-analysis fuzzy-c-means

asked Jul 18 '11 at 16:47

Martin Thoma

124,992
159
614
958

votes

5 answers

How to get the K most distant points, given their coordinates?

We have boring CSV with 10000 rows of ages (float), titles (enum/int), scores (float), .... We have N columns each with int/float values in a table. You can imagine this as points in ND space We want to pick K points that would have maximised…

python cluster-analysis metrics points

asked Jun 25 '20 at 13:45

DuckQueen

votes

3 answers

Global Dynamic Supervisor in a cluster

I have a unique issue that I have not had a need to address in elxir. I need to use the dynamic supervisor to start (n) amount of children dynamicly in a clustered environment. I am using libcluster to manage the clustering and use the global…

elixir cluster-analysis

asked Oct 01 '18 at 13:30

Botonomous

1,746
1
16
39

votes

4 answers

Clustering ~100,000 Short Strings in Python

I want to cluster ~100,000 short strings by something like q-gram distance or simple "bag distance" or maybe Levenshtein distance in Python. I was planning to fill out a distance matrix (100,000 choose 2 comparisons) and then do hierarchical…

python numpy cluster-analysis levenshtein-distance

asked Nov 22 '10 at 02:27

135498

votes

1 answer

How to generate performance stats of clustering from flexclust?

After trying a few clustering algorithms, I got the best performance on my dataset using flexclust::kcca with family = kccaFamily("angle"). Here's an example using the Nclus dataset from flexclust. library(fpc) library(flexclust) data(Nclus) k <-…

r cluster-analysis

asked Aug 03 '16 at 06:39

Richie Cotton

118,240
47
247
360

votes

4 answers

DBSCAN on spark : which implementation

I would like to do some DBSCAN on Spark. I have currently found 2 implementations: https://github.com/irvingc/dbscan-on-spark https://github.com/alitouka/spark_dbscan I have tested the first one with the sbt configuration given in its github but:…

scala apache-spark cluster-analysis apache-spark-mllib dbscan

asked Mar 18 '16 at 17:39

Benjamin

3,350
4
24
49

votes

2 answers

Which programming structure for clustering algorithm

I am trying to implement the following (divisive) clustering algorithm (below is presented short form of the algorithm, the full description is available here): Start with a sample x, i = 1, ..., n regarded as a single cluster of n data points and a…

python data-structures cluster-analysis hierarchical-clustering

asked Aug 20 '15 at 06:49

Andrej

3,719
11
44
73

votes

1 answer

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson…

python numpy cluster-analysis machine-learning scipy

asked May 25 '10 at 19:39

user248237

votes

2 answers

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

I need to cluster some data and I tried kmeans, pam, and clara with R. The problem is that my data are in a column of a data frame, and contains NAs. I used na.omit() to get my clusters. But then how can I associate them with the original data? The…

r cluster-analysis k-means na missing-data

asked Dec 18 '14 at 11:54

Bakaburg

3,165
4
32
64

votes

3 answers

clustering very large dataset in R

I have a dataset consisting of 70,000 numeric values representing distances ranging from 0 till 50, and I want to cluster these numbers; however, if I'm trying the classical clustering approach, then I would have to establish a 70,000X70,000…

r machine-learning bigdata cluster-analysis data-mining

asked Feb 24 '14 at 10:24

DOSMarter

1,485
5
21
29

votes

3 answers

mahout lucene document clustering howto?

I'm reading that i can create mahout vectors from a lucene index that can be used to apply the mahout clustering algorithms. http://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text I would like to apply K-means clustering…

indexing lucene cluster-analysis k-means mahout

asked Dec 04 '09 at 10:17

maiky

3,503
7
28
28

votes

3 answers

Identify clusters in SOM (Self Organizing Map)

Once I have collected and organized data in a SOM how do I identify clusters? (Items are aggregated and clustered using many traits - upwards of 10) Specifically I want to find the 'center' of the cluster - therefor giving me the 'center' node(s).

cluster-analysis som

asked Oct 25 '12 at 18:31

Tyler Wall

3,747
7
37
52

Prev 1 2 3

…

99 100 Next