Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

2 answers

Google Maps API v3, lots of markers, clustering and performance

I have about 5000 markers I need to render on Google Map. I'm currently using the API (v3) and there are performance issues on slower machines, especially in IE. I have done the following already to help speed things up: Used a simple marker class…

performance google-maps google-maps-api-3 google-maps-markers cluster-analysis

asked Feb 27 '11 at 11:42

JamieNewman

votes

1 answer

Using a smoother with the L Method to determine the number of K-Means clusters

Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater…

algorithm cluster-analysis k-means linear-regression

asked Oct 27 '10 at 13:35

winwaed

7,645
6
36
81

votes

7 answers

Clustering Lat/Longs in a Database

I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application. There are a number of resources about how to cluster, either on the client side OR in…

database latitude-longitude cluster-analysis geography

asked Dec 01 '08 at 04:36

Pure.Krome

84,693
113
396
647

votes

2 answers

How do I manually create a dendrogram (or "hclust") object ? (in R)

I have a dendrogram given to me as images. Since it is not very large, I can construct it "by hand" into an R object. So my question is how do I manually create a dendrogram (or "hclust") object when all I have is the dendrogram image? I see that…

r cluster-analysis dendrogram

asked Feb 22 '10 at 12:50

Tal Galili

24,605
44
129
187

votes

2 answers

What is the state-of-the-art in unsupervised learning on temporal data?

I'm looking for an overview of the state-of-the-art methods that find temporal patterns (of arbitrary length) in temporal data and are unsupervised (no labels). In other words, given a steam/sequence of (potentially high-dimensional) data, how do…

machine-learning cluster-analysis time-series pattern-recognition unsupervised-learning

asked Aug 07 '12 at 21:06

schaul

1,021
9
21

votes

2 answers

Plotting dendrogram in Scipy error for large dataset

I am using Scipy for hierarchial clustering. I do manage to get flat clusters on a threshold using fcluster. But I need to visualize the dendrogram formed. When I use the dendrogram method, it works fine for 5-6k user vectors. But my dataser…

python scipy cluster-analysis dendrogram

asked Apr 18 '12 at 06:42

Maxwell

votes

3 answers

Equivalent of Matlab's cluster quality function?

MATLAB has a nice silhouette function to help evaluate the number of clusters for k-means. Is there an equivalent for Python's Numpy/Scipy as well?

python matlab numpy cluster-analysis scipy

asked Jul 10 '11 at 23:29

Legend

113,822
119
272
400

votes

2 answers

DBSCAN with custom metric

I have the following given: a dataset in the range of thousands a way of computing the similarity, but the datapoints themselves I cannot plot them in euclidian space I know that DBSCAN should support custom distance metric but I dont know how to…

python scikit-learn cluster-analysis

asked Feb 13 '18 at 13:29

zython

1,176
4
22
50

votes

1 answer

initial centroids for scikit-learn kmeans clustering

if I already have a numpy array that can serve as the initial centroids, how can I properly initialize the kmeans algorithm? I am using the scikit-learn Kmeans class this post (k-means with selected initial centers) indicates that I only need to set…

python scikit-learn cluster-analysis k-means

asked Jul 13 '16 at 14:54

webmaker

votes

2 answers

Incremental clustering algorithm for grouping news articles?

I'm doing a little research on how to cluster articles into 'news stories' ala Google News. Looking at previous questions here on the subject, I often see it recommended to simply pull out a vector of words from an article, weight some of the words…

cluster-analysis

asked Aug 31 '10 at 18:32

Peter

votes

3 answers

How to use NLP to separate a unstructured text content into distinct paragraphs?

The following unstructured text has three distinct themes -- Stallone, Philadelphia and the American Revolution. But which algorithm or technique would you use to separate this content into distinct paragraphs? Classifiers won't work in this…

text nlp classification cluster-analysis text-segmentation

asked Jul 13 '10 at 13:30

user193116

3,498
6
39
58

votes

1 answer

overplot multiple sets of data with hexbin

I am doing some KMeans clustering on a large and really dense data set and I am trying to figure out the best way to visualize the clusters. In 2D, it looks like hexbin would do a good job but I am unable to overplot the clusters on the same…

python matplotlib cluster-analysis scatter-plot seaborn

asked Jul 20 '15 at 18:31

Labibah

5,371
6
25
23

votes

2 answers

How to identify Cluster labels in kmeans scikit learn

I am learning python scikit. The example given here displays the top occurring words in each Cluster and not Cluster name. http://scikit-learn.org/stable/auto_examples/document_clustering.html I found that the km object has "km.label" which lists…

python machine-learning scikit-learn cluster-analysis k-means

asked Feb 05 '15 at 13:00

vij555

votes

4 answers

How to find cluster sizes in 2D numpy array?

My problem is the following, I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example: [[0 0 0 0 0 0 0 0 0 0] [0 0 1 0 0 0 0 0 0 0] [0 0 1 0 1 0 0 0 1 0] [0 0 0 0 0 0 1 0 1 0] …

python arrays numpy block cluster-analysis

asked Sep 04 '14 at 11:47

Cecilia

votes

3 answers

Clustering of news articles

My scenario is pretty straightforwrd: I have a bunch of news articles (~1k at the moment) for which I know that some cover the same story/topic. I now would like to group these articles based on shared story/topic, i.e., based on their…

machine-learning nlp cluster-analysis information-retrieval unsupervised-learning

asked Aug 10 '14 at 11:39

Christian

3,239
5
38
79

Prev 1 2 3

…

99 100 Next