Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

6 answers

scikit-learn: Finding the features that contribute to each KMeans cluster

Say you have 10 features you are using to create 3 clusters. Is there a way to see the level of contribution each of the features have for each of the clusters? What I want to be able to say is that for cluster k1, features 1,4,6 were the primary…

python scikit-learn cluster-analysis k-means

asked Dec 15 '14 at 19:01

cmgerber

2,199
3
16
15

votes

2 answers

scikit-learn: clustering text documents using DBSCAN

I'm tryin to use scikit-learn to cluster text documents. On the whole, I find my way around, but I have my problems with specific issues. Most of the examples I found illustrate clustering using scikit-learn with k-means as clustering algorithm.…

machine-learning scikit-learn cluster-analysis data-mining dbscan

asked Aug 09 '14 at 09:22

Christian

3,239
5
38
79

votes

7 answers

Can k-means clustering do classification?

I want to know whether the k-means clustering algorithm can do classification? If I have done a simple k-means clustering . Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is…

algorithm cluster-analysis data-mining k-means

asked Mar 10 '14 at 13:00

Sirius Wang

votes

2 answers

DBSCAN in scikit-learn of Python: save the cluster points in an array

following the example Demo of DBSCAN clustering algorithm of Scikit Learning i am trying to store in an array the x, y of each clustering class import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics from…

python cluster-analysis scikit-learn dbscan

asked Aug 14 '13 at 16:40

Gianni Spear

7,033
22
82
131

votes

8 answers

Map Clustering Algorithm

My current code is pretty quick, but I need to make it even faster so we can accommodate even more markers. Any suggestions? Notes: The code runs fastest when the SQL statement is ordered by marker name - which itself does a very partial job of…

php performance algorithm google-maps cluster-analysis

asked Sep 16 '09 at 16:58

Chris B

15,524
5
33
40

votes

7 answers

Multidimensional Euclidean Distance in Python

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy. Here is my code: import numpy,scipy; A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07,…

python numpy scipy cluster-analysis euclidean-distance

asked Feb 23 '12 at 14:13

garak

4,713
9
39
56

votes

5 answers

How can I perform K-means clustering on time series data?

How can I do K-means clustering of time series data? I understand how this works when the input data is a set of points, but I don't know how to cluster a time series with 1XM, where M is the data length. In particular, I'm not sure how to update…

matlab time-series cluster-analysis data-mining k-means

asked Aug 17 '10 at 14:44

Jaz

votes

8 answers

Java Clustering Library

I am looking for a light weight clustering library in java. I don't need 100s of clustering algo in that library just 5 to 7 algo would be fine for me. I am sure, you are going to ask: "what kind of algo do you need and for what purpose" :). I just…

java math cluster-analysis

asked Jan 24 '10 at 22:56

user238384

2,396
10
35
36

votes

3 answers

Better text documents clustering than tf/idf and cosine similarity?

I'm trying to cluster the Twitter stream. I want to put each tweet to a cluster that talk about the same topic. I tried to cluster the stream using an online clustering algorithm with tf/idf and cosine similarity but I found that the results are…

machine-learning data-mining cluster-analysis text-mining

asked Jul 08 '13 at 23:40

Jack Twain

6,273
15
67
107

votes

1 answer

How can I fix a MemoryError when executing scikit-learns silhouette score?

I run a clustering algorithm and want to evaluate the result by using silhouette score in scikit-learn. But in the scikit-learn, it needs to calculate the distance matrix: distances = pairwise_distances(X, metric=metric, **kwds) Due to the fact that…

memory machine-learning cluster-analysis scikit-learn

asked May 07 '13 at 17:06

Thien Bao

votes

1 answer

How to get Agglomerative Clustering "Centroid" in python Scikit-learn

This code is what I am using for silhouette_score. And in here I am using Agglomerative Clustering, linkage as Ward. I would like to get "Centroid" of Agglomerative Clustering, would it be possible from Agglomerative Clustering? I could only get…

python pandas scikit-learn cluster-analysis centroid

asked Jun 05 '19 at 08:13

Pandalove

votes

6 answers

Grouping similar news contents together like in GOOGLE NEWS

I am unable to manage the RSS feeds easily due to an overwhelming number of new stories / similar news contents posted in various news sites. For subjects such as world news and business news, many of the stories are redundant, adding a burden to…

php rss cluster-analysis feed

asked Oct 18 '10 at 10:09

Gourav

votes

1 answer

Clustering cosine similarity matrix

A few questions on stackoverflow mention this problem, but I haven't found a concrete solution. I have a square matrix which consists of cosine similarities (values between 0 and 1), for example: | A | B | C | D A | 1.0 | 0.1 | 0.6 | 0.4 B…

python math scikit-learn cluster-analysis data-mining

asked May 06 '15 at 23:58

Stefan D

1,229
2
15
29

votes

1 answer

How to use 'hclust' as function call in R

I tried to construct the clustering method as function the following ways: mydata <- mtcars # Here I construct hclust as a function hclustfunc <- function(x) hclust(as.matrix(x),method="complete") # Define distance metric distfunc <- function(x)…

r cluster-analysis function-calls hclust

asked Dec 03 '13 at 05:14

neversaint

60,904
137
310
477

votes

4 answers

Best clustering algorithm? (simply explained)

Imagine the following problem: You have a database containing about 20,000 texts in a table called "articles" You want to connect the related ones using a clustering algorithm in order to display related articles together The algorithm should do…

algorithm text cluster-analysis data-mining text-mining

asked May 12 '09 at 14:38

caw

30,999
61
181
291

Prev 1 2 3

…

99 100 Next