Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
Wikipedia has an entry on the topic here.

1187 questions

votes

3 answers

spatial clustering in R (simple example)

I have this simple data.frame lat<-c(1,2,3,10,11,12,20,21,22,23) lon<-c(5,6,7,30,31,32,50,51,52,53) data=data.frame(lat,lon) The idea is to find the spatial clusters based on the distance First, I plot the map (lon,lat)…

r geospatial spatial hierarchical-clustering

asked Feb 23 '15 at 11:11

Math

1,274
3
14
32

votes

1 answer

How do you visualize a ward tree from sklearn.cluster.ward_tree?

In sklearn there is one agglomerative clustering algorithm implemented, the ward method minimizing variance. Usually sklearn is documented with lots of nice usage examples, but I couldn't find examples of how to use this function. Basically my…

python machine-learning scikit-learn hierarchical-clustering

asked Feb 28 '14 at 21:45

user1603472

1,408
15
24

votes

1 answer

Understanding DynamicTreeCut algorithm for cutting a dendrogram

A dendrogram is a data structure used with hierarchical clustering algorithms that groups clusters at different "heights" of a tree - where the heights correspond to distance measures between clusters. After a dendrogram is created from some input…

algorithm cluster-analysis hierarchical-clustering dendrogram unsupervised-learning

asked Sep 03 '16 at 15:48

Siler

8,976
11
64
124

votes

2 answers

Extract cluster color from output of dendextend::circlize_dendrogram()

I am trying to extract the colors used in the clustering of circlize_dendrogram. Here is a sample codes: library(magrittr) library(dendextend) cols <- c("#009000", "#FF033E", "#CB410B", "#3B444B", "#007FFF") dend <- iris[1:40,-5] %>% dist %>%…

r hierarchical-clustering dendextend

asked Apr 21 '16 at 04:07

Al-Ahmadgaid Asaad

1,172
5
13
25

votes

0 answers

Using precision recall metric on a hierarchy of recovered clusters

Context: We are two students intending to write a thesis on reverse engineering namespaces using hierarchical agglomerative clustering algorithms. We have a variation of linking methods and other tweaks to the algorithm we want to try out. We will…

cluster-analysis hierarchical-clustering precision-recall

asked Apr 05 '16 at 10:51

David

votes

2 answers

How to traverse a tree from sklearn AgglomerativeClustering?

I have a numpy text file array at: https://github.com/alvations/anythingyouwant/blob/master/WN_food.matrix It's a distance matrix between terms and each other, my list of terms are as such: http://pastebin.com/2xGt7Xjh I used the follow code to…

python machine-learning scipy scikit-learn hierarchical-clustering

asked Dec 09 '14 at 18:54

alvas

115,346
109
446
738

votes

0 answers

Distance metric in the Python fastcluster module

I want to do hierarchical clustering with the fastcluster module. When i the default (euclidian) distance metric, it works fine: import fastcluster import scipy.cluster.hierarchy distance = spatial.distance.pdist(data) linkage =…

python scipy hierarchical-clustering

asked Sep 29 '13 at 11:32

user1680859

1,160
2
24
40

votes

3 answers

With SciPy how do I get clustering for k=? with doing hierarchical clustering

So I am using fastcluster with SciPy to do agglomerative clustering. I can do dendrogram to get the dendrogram for the clustering. I can do fcluster(Z, sqrt(D.max()), 'distance') to get a pretty good clustering for my data. What if I want to…

scipy hierarchical-clustering

asked Jul 12 '13 at 14:13

demongolem

9,474
36
90
105

votes

2 answers

Sklearn Agglomerative Clustering Custom Affinity

I'm trying to use agglomerative clustering with a custom distance metric (ie affinity) since I'd like to cluster a sequence of integers by sequence similarity and not something like the euclidean distance which isn't meaningful. My data looks…

python scikit-learn hierarchical-clustering sklearn-pandas

asked Dec 19 '18 at 10:20

ApprenticeOfMathematics

votes

2 answers

Find partial membership with KMeans clustering algorithm

I can calculate cluster membership with KMeans pretty easily: open System open System.IO open Utils open Accord open Accord.Math open Accord.MachineLearning let vals = [| [|1.0; 2.0; 3.0; 2.0|] [|1.1; 1.9; 3.1; 4.0|] [|2.0; 3.0; 4.0;…

c# f# hierarchical-clustering accord.net

asked Dec 20 '16 at 17:34

Steven

3,238
21
50

votes

2 answers

Duelling dendrograms in r (Placing dendrograms back to back in r)

Is there any fairly straight forward way of placing two dendrogram 'back to back' in r? The two dendrograms contain the same objects but are clustered in slightly different ways. I need to emphasise how the dendrograms differ. So something like what…

r plot ggplot2 hierarchical-clustering dendrogram

asked Sep 17 '12 at 09:46

Elizabeth

6,391
17
62
90

votes

1 answer

HDBSCAN difference between parameters

I'm confused about the difference between the following parameters in HDBSCAN min_cluster_size min_samples cluster_selection_epsilon Correct me if I'm wrong. For min_samples, if it is set to 7, then clusters formed need to have 7 or more…

machine-learning scikit-learn cluster-analysis hierarchical-clustering hdbscan

asked Jun 09 '21 at 05:22

HR1

votes

1 answer

Python - Calculate Hierarchical clustering of word2vec vectors and plot the results as a dendrogram

I've generated a 100D word2vec model using my domain text corpus, merging common phrases, for example (good bye => good_bye). Then I've extracted 1000 vectors of desired words. So I have a 1000 numpy.array like so: …

python numpy machine-learning hierarchical-clustering word2vec

asked Jan 04 '17 at 11:28

Shlomi Schwartz

8,693
29
109
186

votes

3 answers

With SciPy dendrogram, can I change the linewidth?

I'm making a big dendrogram using SciPy and in the resulting dendrogram the line thickness makes it hard to see detail. I want to decrease the line thickness to make it easier to see and more MatLab like. Any suggestions? I'm doing: import…

matplotlib scipy hierarchical-clustering

asked Mar 31 '14 at 22:35

ja.kb.ca

votes

2 answers

Where can I find a good set of benchmark clustering datasets with ground truth labels?

I am looking for a clustering dataset with "ground truth" labels for some known natural clustering, preferably with high dimensionality. I found some good candidates here (http://cs.joensuu.fi/sipu/datasets/), but only the Glass and Iris data-sets…

machine-learning dataset cluster-analysis benchmarking hierarchical-clustering

asked Mar 24 '14 at 20:28

user3457088

Prev 1 2

…

79 80 Next