Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

6 answers

Fast (< n^2) clustering algorithm

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a specified radius). That means that there probably has…

algorithm machine-learning cluster-analysis data-mining k-means

asked Dec 09 '10 at 23:11

John Hawksley

votes

1 answer

Clustering text documents using scikit-learn kmeans in Python

I need to implement scikit-learn's kMeans for clustering text documents. The example code works fine as it is but takes some 20newsgroups data as input. I want to use the same code for clustering a list of documents as shown below: documents =…

python python-2.7 scikit-learn cluster-analysis k-means

asked Jan 11 '15 at 17:20

Nabila Shahid

votes

3 answers

Understanding concept of Gaussian Mixture Models

I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means. Here is what I have understood, please let me know if my concept is wrong: GMM is like…

matlab machine-learning classification cluster-analysis mixture-model

asked Sep 24 '14 at 14:33

StuckInPhDNoMore

2,507
4
41
73

votes

2 answers

Estimation of number of Clusters via gap statistics and prediction strength

I am trying to translate the R implementations of gap statistics and prediction strength http://edchedch.wordpress.com/2011/03/19/counting-clusters/ into python scripts for the estimation of number of clusters in iris data with 3 clusters. Instead…

python r cluster-analysis k-means

asked Jan 08 '14 at 17:39

Riyaz

1,430
2
17
27

votes

3 answers

Clustering values by their proximity in python (machine learning?)

I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set. The sorted output is something like…

python machine-learning cluster-analysis data-mining

asked Aug 21 '13 at 17:31

PCoelho

7,850
11
31
36

votes

2 answers

Hierarchical clustering of 1 million objects

Can anyone point me to a hierarchical clustering tool (preferable in python) that can cluster ~1 Million objects? I have tried hcluster and also Orange. hcluster had trouble with 18k objects. Orange was able to cluster 18k objects in seconds, but…

python machine-learning cluster-analysis data-mining hierarchical-clustering

asked Feb 06 '12 at 07:40

Atish Kathpal

votes

2 answers

Group n points in k clusters of equal size

Possible Duplicate: K-means algorithm variation with equal cluster size EDIT: like casperOne point it out to me this question is a duplicate. Anyways here is a more generalized question that cover this one:…

algorithm cluster-analysis k-means

asked Jan 09 '12 at 23:30

Pierre-David Belanger

1,004
1
11
19

votes

5 answers

Distributed hierarchical clustering

Are there any algorithms that can help with hierarchical clustering? Google's map-reduce has only an example of k-clustering. In case of hierarchical clustering, I'm not sure how it's possible to divide the work between nodes. Other resource that I…

algorithm cluster-analysis hierarchical-clustering

asked Sep 17 '08 at 16:00

Roman

13,100
2
47
63

votes

4 answers

Changes of clustering results after each time run in Python scikit-learn

I have a bunch of sentences and I want to cluster them using scikit-learn spectral clustering. I've run the code and get the results with no problem. But, every time I run it I get different results. I know this is the problem with initiation but I…

python scikit-learn cluster-analysis k-means spectral

asked Sep 18 '14 at 20:28

user3430235

votes

3 answers

Clustering text in Python

I need to cluster some text documents and have been researching various options. It looks like LingPipe can cluster plain text without prior conversion (to vector space etc), but it's the only tool I've seen that explicitly claims to work on…

python cluster-analysis nlp

asked Nov 24 '09 at 10:43

Dan

1,677
5
19
34

votes

2 answers

What is the difference between a Confusion Matrix and Contingency Table?

I'm writting a piece of code to evaluate my Clustering Algorithm and I find that every kind of evaluation method needs the basic data from a m*n matrix like A = {aij} where aij is the number of data points that are members of class ci and elements…

matrix cluster-analysis data-mining difference

asked Sep 30 '11 at 15:56

MangMang

votes

6 answers

Merge related words in NLP

I'd like to define a new word which includes count values from two (or more) different words. For example: Words Frequency 0 mom 250 1 2020 151 2 the 124 3 19 82 4 mother 81 ... ... ... 10 London 6 11 life 6 12 something 6 I…

python nlp cluster-analysis word2vec wordnet

asked Sep 02 '20 at 12:44

user13623188

votes

2 answers

How does pytorch backprop through argmax?

I'm building Kmeans in pytorch using gradient descent on centroid locations, instead of expectation-maximisation. Loss is the sum of square distances of each point to its nearest centroid. To identify which centroid is nearest to each point, I use…

machine-learning cluster-analysis pytorch k-means backpropagation

asked Mar 03 '19 at 14:06

jammygrams

votes

2 answers

Python: String clustering with scikit-learn's dbscan, using Levenshtein distance as metric:

I have been trying to cluster multiple datasets of URLs (around 1 million each), to find the original and the typos of each URL. I decided to use the levenshtein distance as a similarity metric, along with dbscan as the clustering algorithm as…

python machine-learning scikit-learn cluster-analysis levenshtein-distance

asked Aug 02 '16 at 12:20

KaziJehangir

votes

3 answers

Detecting object regions in image opencv

We're currently trying to detect the object regions in medical instruments images using the methods available in OpenCV, C++ version. An example image is shown below: Here are the steps we're following: Converting the image to gray…

opencv cluster-analysis object-detection connected-components

asked May 20 '15 at 14:40

Maystro

2,907
8
36
71

Prev 1 2 3

…

99 100 Next