Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

5 answers

Algorithm for clustering pictures based on date taken

Anyone know of an algorithm that will group pictures into events based on the date the picture was taken. Obviously I can group by the date, but I'd like something a little more sophisticated that would(might) be able to group pictures spanning…

algorithm cluster-analysis grouping

asked Mar 06 '09 at 08:26

Greg Dean

29,221
14
67
78

votes

4 answers

WEKA K-Means Clustering

Can anybody explain what the output of the K-Means clustering in WEKA actually means. For example kMeans Number of iterations: 9 Within cluster sum of squared errors: 9434.911100488926 Missing values globally replaced with mean/mode Cluster…

cluster-analysis data-mining weka k-means

asked Apr 26 '11 at 14:09

Chris Taylor

votes

3 answers

Clustering images using unsupervised Machine Learning

I have a database of images that contains identity cards, bills and passports. I want to classify these images into different groups (i.e identity cards, bills and passports). As I read about that, one of the ways to do this task is clustering…

python computer-vision cluster-analysis k-means unsupervised-learning

asked Oct 09 '18 at 12:58

singrium

2,746
5
32
45

votes

1 answer

What are noisy samples in Scikit's DBSCAN clustering algorithm?

If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The documentation calls them noisy samples. What are…

python scikit-learn cluster-analysis dbscan

asked Jul 25 '17 at 20:44

Auxiliary

2,687
5
37
59

votes

4 answers

how to plot a k-distance graph in python

How do I plot (in python) the distance graph for a given value of min-points in DBSCAN??? I am looking for the knee and corresponding epsilon value. In the sklearn I do not see any method that return such distances.... Am I missing something?

python cluster-analysis dbscan

asked Apr 01 '17 at 18:00

Mauro Gentile

1,463
6
26
37

votes

3 answers

clustering list of words in python

I am a newbie in text mining, here is my situation. Suppose i have a list of words ['car', 'dog', 'puppy', 'vehicle'], i would like to cluster words into k groups, I want the output to be [['car', 'vehicle'], ['dog', 'puppy']]. I first calculate…

python nlp cluster-analysis text-mining

asked Jan 31 '17 at 11:25

Kevin Lee

votes

2 answers

Scikit-learn, KMeans: How to use max_iter

I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans. According to the documentation: max_iter : int, default: 300 Maximum number of iterations of the k-means algorithm for a single run. But in my opinion if I have…

python parameters scikit-learn cluster-analysis k-means

asked Dec 01 '16 at 10:10

C-Jay

votes

2 answers

permuting the rows and columns of a matrix for clustering

i have a distance matrix that is 1000x1000 in dimension and symmetric with 0s along the diagonal. i want to form groupings of distances (clusters) by simultaneously reordering the rows and columns of the matrix. this is like reordering a matrix…

matrix cluster-analysis permutation

asked Sep 04 '10 at 05:35

user439463

votes

2 answers

opencv euclidean clustering vs findContours

I have the following image mask: I want to apply something similar to cv::findContours, but that algorithm only joins connected points in the same groups. I want to do this with some tolerance, i.e., I want to add the pixels near each other within…

c++ opencv cluster-analysis

asked Nov 20 '15 at 11:11

manatttta

3,054
4
34
72

votes

1 answer

In wildlfy9, how to make stateful ejb session replication with two node in standalone mode(Clustering)

I want to do clustering with ear project. I found one solution to run standalone in clustering using standalone-ha.xml configuration. I followed below article. It's working fine. Clustering in domain mode with wildfly9 But I want to run ERP project…

java session cluster-analysis wildfly stateful-session-bean

asked Oct 08 '15 at 06:49

Aditi

votes

2 answers

Choosing the number of clusters in heirarchical agglomerative clustering with scikit

The wikipedia article on determining the number of clusters in a dataset indicated that I do not need to worry about such a problem when using hierarchical clustering. However when I tried to use scikit-learn's agglomerative clustering I see that I…

machine-learning scikit-learn artificial-intelligence cluster-analysis unsupervised-learning

asked Aug 26 '15 at 09:18

DaTaBomB

votes

3 answers

How to find Local maxima in Kernel Density Estimation?

I'm trying to make a filter (to remove outlier and noise) using kernel density estimators(KDE). I applied KDE in my 3D (d=3) data points and that gives me the probability density function (PDF) f(x). Now as we know local maxima of density estimation…

python machine-learning cluster-analysis kernel-density

asked Jul 03 '15 at 03:36

jquery404

votes

1 answer

Affinity Propagation (sklearn) - strange behavior

Trying to use affinity propagation for a simple clustering task: from sklearn.cluster import AffinityPropagation c = [[0], [0], [0], [0], [0], [0], [0], [0]] af = AffinityPropagation (affinity = 'euclidean').fit (c) print (af.labels_) I get this…

scikit-learn cluster-analysis

asked Jun 14 '15 at 13:25

Baba

votes

2 answers

Clustering Categorical data using jaccard similarity

I am trying to build a clustering algorithm for categorical data. I have read about different algorithm's like k-modes, ROCK, LIMBO, however I would like to build one of mine and compare the accuracy and cost to others. I have (m) training set and…

python-2.7 machine-learning cluster-analysis data-mining k-means

asked May 09 '15 at 12:47

Sam

2,545
8
38
59

votes

1 answer

clusplot - showing variables

I would like to add to a clusplot plot the variables used for pca as arrows. I am not sure that a way has been implemented (I can't find anything in the documentation). I have produced a clusplot that looks like this: With the package princomp I…

r cluster-analysis pca

asked Apr 21 '15 at 11:11

Dario Lacan

1,099
1
11
25

Prev 1 2 3

…

99 100 Next