Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

2 answers

non density based Data clustering algorithm

I'm working on a cluster analysis program that takes a set of points S as an input and labels each point with that index of the cluster it belong to. I've implemented the DBScan and OPTICS algorithms and they both work as expected. However, the…

c++ c algorithm cluster-analysis data-mining

asked Oct 03 '10 at 17:43

dotminic

1,135
2
14
28

votes

1 answer

Effect of mat2gray on multithresh

I do not get why a segmentation obtained by using multithresh on an "original" double image is different from a segmentation using the same parameters on the same image scaled by mat2gray. E.g.: testimage = randi(100,[200…

matlab image-processing cluster-analysis image-segmentation

asked Jul 04 '16 at 15:01

user1809923

1,235
3
12
27

votes

2 answers

revealing clusters of interaction in igraph

I have an interaction network and I used the following code to make an adjacency matrix and subsequently calculate the dissimilarity between the nodes of the network and then cluster them to form…

r cluster-analysis igraph hierarchical-clustering

asked Jul 02 '16 at 17:10

johnny utah

votes

3 answers

DBSCAN vs OPTICS for Automatic Clustering

I know that DBSCAN requires two parameters (minPts and Eps). However, I am confused on what parameters are needed for OPTICS because some sources say it requires eps while others say it only requires minPts. Which algorithm would be the better to…

algorithm cluster-analysis dbscan optics-algorithm

asked Jun 27 '16 at 22:46

user3315340

votes

1 answer

XMeans ELKI fails at every third input file

I'm trying to cluster image data (stored in 100 separate csv files) with ELKI's XMeans algorithm. It works well for the first two files, but then the algorithm hangs on forever while processing the third file. It looks like the problem occurs at…

cluster-analysis elki

asked Jun 27 '16 at 20:08

Charlie28000

votes

2 answers

How to use WeightedCluster::wcKMedoids to provide clustering for heatmap or heatmap.2 in R?

TL;DR: How to use the WeightedCluster library (the wcKMedoids() method in particular) as input to heatmap, heatmap.2 or similar, to provide it with clustering info? We are creating a heatmap from some binary data (yes/no values, represented as ones…

r cluster-analysis heatmap

asked Jun 17 '16 at 17:46

Samuel Lampa

4,336
5
42
63

votes

1 answer

Choose the number of clusters and vertices in python igraph

I have a complete weighted graph as you can see in the image below: The Goal: My goal is to be able to choose the number of clusters and the number of vertices in each cluster using python's implementation of iGraph What I've Tried So Far: import…

python graph cluster-analysis graph-theory igraph

asked Jun 15 '16 at 19:27

jackzellweger

votes

1 answer

Text clustering using arbitrary metrics with sklearn kmeans

I'm running text clustering on a table that contains medical terms, I want to cluster strings that have similar words, if two have have two words or more, should be included in one cluster more likely than if they only have one word in common. I…

python cluster-analysis k-means cosine-similarity

asked Jun 02 '16 at 20:14

Lelo

votes

0 answers

K-modes clustering in R for categorical data with NAs

dat <- data.frame(x=sample(letters[1:3],20,TRUE),y=sample(LETTERS[7:9],20,TRUE),stringsAsFactors=FALSE) dat[c(1:5,9,17,20),1] <- NA;dat[c(8,11),2] <- NA dat x y 1 H 2 I 3 G 4 H 5 I 6 c H 7…

r cluster-analysis na categorical-data

asked Jun 01 '16 at 19:27

Roy C

votes

1 answer

Are there advantages of using sklearn KMeans versus SciPy kmeans?

From the documentation of sklearn KMeans class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1) and SciPy…

python scipy scikit-learn cluster-analysis k-means

asked May 13 '16 at 13:02

pepe

9,799
25
110
188

votes

1 answer

Determining effects of clustering

In clustering what effects does noisy,redundant, and irrelevant attributes have on it? Do they end up helping or hurting clustering?I know that it is unable to handle noisy data but not sure on the other two.

machine-learning cluster-analysis data-mining

asked May 09 '16 at 03:44

chris551

votes

2 answers

Is this the expected behavior of the DBSCAN algorithm (two identical data samples not fitting in the same cluster)?

Please forgive the lack of formal terms, I've only recently approached ML. For learning purposes, I decided to try a Ruby implementation of the DBSCAN algorithm (https://github.com/matiasinsaurralde/dbscan). Building on the simple example at…

arrays ruby machine-learning cluster-analysis dbscan

asked May 08 '16 at 23:03

Redoman

3,059
3
34
62

votes

0 answers

Why results are different in hclust and heat map.2 using same clustering functions?

I'm trying to understand a bit more my data doing some clustering analysis. Using the same data, I've done first a hclust with this code: # Dissimilarity matrix df <-scale(m.sel) d <- dist(df, method = "euclidean") # Hierarchical clustering using…

r cluster-analysis heatmap hclust

asked May 02 '16 at 18:28

Fabiola Fernández

votes

1 answer

String clustering in Python

I have a list of strings and I want to classify it by using clustering in Python. list = ['String1', 'String2', 'String3',...] I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be…

python string scipy cluster-analysis

asked Apr 30 '16 at 01:39

Muny

votes

1 answer

Computation of clusters

I am testing out a few clustering algorithms on a dataset of text documents (with word frequencies as features). Running some of the methods of Scikit Learn Clustering one after the other, below is how long they take on ~ 50,000 files with 26…

python numpy nlp cluster-analysis k-means

asked Apr 28 '16 at 14:50

patrick

4,455
6
44
61

Prev 1 2 3

…

99 100 Next