Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

1 answer

Scipy's sparse eigsh() for small eigenvalues

I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh(). Here's my…

python scipy cluster-analysis linear-algebra sparse-matrix

asked Aug 25 '12 at 21:54

Magsol

4,640
11
46
68

votes

1 answer

Bisecting k-means clustering algorithm explanation

I was required to write a bisecting k-means algorithm, but I didnt understand the algorithm. I know k-means algorithm. Can you explain the algorithm, but not in academic language Thanks.

algorithm cluster-analysis k-means

asked Jul 29 '11 at 10:04

Nir

2,497
9
42
71

votes

8 answers

ALGORITHM - String similarity score/hash

Is there a method to calculate something like general "similarity score" of a string? In a way that I am not comparing two strings together but rather I get some number/scores (hash) for each string that can later tell me that two strings are or are…

python string algorithm cluster-analysis hash

asked Jul 12 '11 at 14:00

Ajay

4,134
3
20
19

votes

3 answers

Affinity Propagation preferences initialization

I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to the same instance, or to a limited number of group. I thought affinity…

machine-learning scikit-learn cluster-analysis unsupervised-learning

asked Oct 17 '15 at 13:45

alessandro.ferrari

votes

1 answer

DIvisive ANAlysis (DIANA) Hierarchical Clustering

(This post is continuation of my previous question on divisive hierarchical clustering algorithm.) The problem is how to implement this algorithm in Python (or any other language). Algorithm description A divisive clustering proceeds by a series of…

python algorithm cluster-analysis hierarchical-clustering

asked Aug 26 '15 at 14:12

Andrej

3,719
11
44
73

votes

4 answers

Python: DBSCAN in 3 dimensional space

I have been searching around for an implementation of DBSCAN for 3 dimensional points without much luck. Does anyone know I library that handles this or has any experience with doing this? I am assuming that the DBSCAN algorithm can handle 3…

python cluster-analysis dbscan

asked Oct 07 '14 at 21:59

user2909415

votes

1 answer

PCA multiplot in R

I have a dataset that looks like this: India China Brasil Russia SAfrica Kenya States Indonesia States Argentina Chile Netherlands HongKong 0.0854026763 0.1389383234 0.1244184371 0.0525460881 0.2945586244 0.0404562539 …

r plot 3d cluster-analysis pca

asked Jun 18 '14 at 09:38

Angelo

4,829
7
35
56

votes

4 answers

clustering image segments in opencv

I am working on motion detection with non-static camera using opencv. I am using a pretty basic background subtraction and thresholding approach to get a broad sense of all that's moving in a sample video. After thresholding, I enlist all separable…

c++ c opencv image-processing cluster-analysis

asked May 24 '14 at 08:32

Ekansh Gupta

votes

6 answers

Clustered Graphs Visualization Techniques

I need to visualize a relatively large graph (6K nodes, 8K edges) that has the following properties: Distinct Clusters. Approximately 50-100 Nodes per cluster and moderate interconnectivity at the cluster level Minimal (5-10 inter-cluster edges per…

graph cluster-analysis visualization

asked Mar 01 '10 at 15:42

jameszhao00

7,213
15
62
112

votes

3 answers

Which data clustering algorithm is appropriate to detect an unknown number of clusters in a time series of events?

Here's my scenario. Consider a set of events that happen at various places and times - as an example, consider someone high above recording the lightning strikes in a city during a storm. For my purpose, lightnings are instantaneous and can only hit…

algorithm language-agnostic cluster-analysis

asked Feb 20 '10 at 06:19

wishihadabettername

14,231
21
68
85

votes

2 answers

Estimating/Choosing optimal Hyperparameters for DBSCAN

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of…

data-mining cluster-analysis dbscan

asked Feb 24 '13 at 09:29

Riyaz

1,430
2
17
27

votes

2 answers

How can i cluster document using k-means (Flann with python)?

I want to cluster documents based on similarity. I haved tried ssdeep (similarity hashing), very fast but i was told that k-means is faster and flann is fastest of all implementations, and more accurate so i am trying flann with python bindings but…

nlp cluster-analysis data-mining k-means text-mining

asked Sep 19 '12 at 14:51

Phyo Arkar Lwin

6,673
12
41
55

votes

3 answers

Find groups with high cross correlation matrix in Matlab

Given a lower triangular matrix (100x100) containg cross-correlation values, where entry 'ij' is the correlation value between signal 'i' and 'j' and so a high value means that these two signals belong to the same class of objects, and knowing there…

matlab cluster-analysis correlation

asked Sep 02 '12 at 07:26

user1641496

votes

1 answer

Networkx graph clustering

in Networkx, how can I cluster nodes based on nodes color? E.g., I have 100 nodes, some of them are close to black, while others are close to white. In the graph layout, I want nodes with similar color stay close to each other, and nodes with very…

python cluster-analysis graphviz data-visualization networkx

asked Mar 02 '12 at 23:56

Geni

votes

1 answer

R Clustering 'purity' metric

I am using fpc package in R to perform cluster validation. I could use the function cluster.stats() to compare my clustering with an external partitioning and compute several metrics like Rand Index, entropy e.t.c. However, I am looking for a…

r cluster-analysis

asked Feb 12 '12 at 23:45

chet

Prev 1 2 3

…

99 100 Next