Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
8
votes
1 answer

Is there an efficient way to cluster a graph according to Jaccard similarity?

Is there an efficient way to cluster nodes in a graph using Jaccard similarity such that each cluster has at least K nodes? Jaccard similarity between nodes i and j: Let S be the set of neighbours of i and T be the set of neighbours of j. Then the…
HHH
  • 6,085
  • 20
  • 92
  • 164
7
votes
1 answer

Can't get scipy hierarchical clustering to work

I wrote a simple script that is intended to do hierarchical clustering on a simple test dataset. I found the function fclusterdata to be a candidate to cluster my data into two clusters. It takes two mandatory call parameters: the data set and a…
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
7
votes
2 answers

How to get the optimal number of clusters using hierarchical cluster analysis automatically in python?

I want to use hierarchical cluster analysis to get the optimal number (K) of clusters automatically, then apply this K to K-means clustering in python. After studying many article, I know some methods tell us that we can plot the graph to determine…
yichun
  • 85
  • 1
  • 1
  • 6
7
votes
2 answers

HDBSCAN Python choose number of clusters

Is is possible to select the number of clusters in the HDBSCAN algorithm in python? Or the only way is to play around with the input parameters such as alpha, min_cluster_size? Thanks UPDATE: here is the code to use fcluster and hdbscan import…
user1571823
  • 394
  • 5
  • 20
7
votes
3 answers

sklearn Hierarchical Agglomerative Clustering using similarity matrix

Given a distance matrix, with similarity between various professors : prof1 prof2 prof3 prof1 0 0.8 0.9 prof2 0.8 0 0.2 prof3 0.9 0.2 0 I need to perform…
ICoder
  • 149
  • 1
  • 3
  • 9
7
votes
1 answer

How to assign clusters to new observations (test data) using hierchical clustering?

from scipy.cluster.hierarchy import dendrogram, linkage,fcluster import numpy as np import matplotlib.pyplot as plt # data np.random.seed(4711) # for repeatability of this tutorial a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]],…
muon
  • 12,821
  • 11
  • 69
  • 88
7
votes
3 answers

Newick tree representation to scipy.cluster.hierarchy linkage matrix format

I have a set of genes which have been aligned and clustered based on DNA sequences, and I have this set of genes in a Newick tree representation (https://en.wikipedia.org/wiki/Newick_format). Does anyone know how to convert this format to the…
themantalope
  • 1,040
  • 11
  • 42
7
votes
1 answer

How do I weight variables with gower distance in r

I am new to R and am working on a data set including nominal, ordinal and metric data. Therefore I am using the gower distance. In the next step I use this distance with hclust(x, method="complete") to create clusters based on this distance. Now I…
user3231946
  • 73
  • 1
  • 5
7
votes
2 answers

Scipy dendrogram leaf label colours

Is it possible to assign colours to leaf labels of dendrogram plots from Scipy? I can't figure it out from the documentation. Here's what I've tried so far: from scipy.spatial.distance import pdist, squareform from scipy.cluster.hierarchy import…
herrfz
  • 4,814
  • 4
  • 26
  • 37
7
votes
1 answer

cluster presentation dendrogram alternative in r

I know dendrograms are quite popular. However if there are quite large number of observations and classes it hard to follow. However sometime I feel that there should be better way to present the same thing. I got an idea but do not know how to…
fprd
  • 621
  • 7
  • 21
6
votes
2 answers

Strange error of Hierarchical Clustering in R

My R program is as below: hcluster <- function(dmatrix) { imatrix <- NULL hc <- hclust(dist(dmatrix), method="average") for(h in sort(unique(hc$height))) { hc.index <- c(h,as.vector(cutree(hc,h=h))) imatrix <-…
Kevin
  • 2,191
  • 9
  • 35
  • 49
6
votes
2 answers

Dendogram Coloring by groups

I created a heatmap based on spearman's correlation matrix using seaborn clustermap as folowing: I want to paint the dendrogram. I want the dendrogram to look like this: dendrogram but on the heatmap I created a dict of colors as folowing and got an…
6
votes
2 answers

MapBox supercluster wrong cluster locations

I'm making a clustered map with mapbox supercluster. The problem I face is that clusters are not at the correct location. For example I only have dogs in the netherlands but when zoomed out they are in france aswel. When I zoom further the…
Jenssen
  • 1,801
  • 4
  • 38
  • 72
6
votes
1 answer

NLP bag-of-words/TF-IDF for clustering (and classifying) short sentences

I want to cluster Javascript objects by one of their string key values (description). I already tried multiple solutions and would like some guidance on how to approach the problem. What I want: Let's say I have a database of objects. There can be a…
6
votes
1 answer

what is the meaning of the return values of the scipy.cluster.hierarchy.linkage?

Let assume that we have X matrix as follows: [[9 0] [1 4] [2 3] [8 5]] Then, from scipy.cluster.hierarchy import linkage Z = linkage(X, method="ward") print(Z) The returning matrix is follows: [[ 1. 2. 1.41421356 2. …