Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
3
votes
1 answer

Hierarchical cluster analysis help - dendrogram

I made a code to generate a dendrogram as you can see in the image, using the hclust function. So, I would like help in the interpretation of this dendrogram. Note that the locations of these points are close. What does this dendrogram result I'm…
Antonio
  • 1,091
  • 7
  • 24
3
votes
1 answer

How can I find the common parent of two clusters created from scipy.cluster.hierarchy.dendrogram?

I have two clusters that are leaves/nodes from a dendrogram created with scipy.cluster.hierarchy.dendrogram. I want to find the closest common parent of the two clusters. However, I do not have the cluster indices as indicated by the dendrogram. I…
3
votes
1 answer

Generating simulated data with with and intracluster correlation

I have a dataset that looks something like this d<–structure(list(groupid = c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,…
Alex
  • 1,207
  • 9
  • 25
3
votes
2 answers

How to visualize clusters overlaying a circle plot in R?

I have a plot I make using a website called Revigo that provides an R script (included below) to create a plot like this: I'm looking to see if it's possible to perform and visualize a clustering on top of these points in the same graph? Since this…
DN1
  • 234
  • 1
  • 13
  • 38
3
votes
1 answer

scipy.cluster.hierarchy.dendrogram(): exactly what does truncate_mode='level' do?

The documentation says, "No more than p levels of the dendrogram tree are displayed. A “level” includes all nodes with p merges from the last merge." (p is another parameter) I can't figure out what "p merges from the last merge" means. Can anyone…
Mark Pundurs
  • 45
  • 1
  • 6
3
votes
0 answers

Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : missing value where TRUE/FALSE for Gower distance

I am struggling to get hierarchical clustering, in R. Please do not downgrade this post since I have tried what is at this link How to use 'hclust' as function call in R Yet I haven't succeeded. A sample of data is…
GaB
  • 1,076
  • 2
  • 16
  • 29
3
votes
2 answers

warning: uncondensed distance matrix in python

I try to make a Dendrogram Associated for the Agglomerative Hierarchical Clustering and I need the Distance Matrix. I started with: import numpy as np import pandas as pd from scipy import ndimage from scipy.cluster import hierarchy from…
3
votes
0 answers

How to find few similar vectors in a huge amount of vectors?

Assume a huge amount (e.g. a billion) of vectors (e.g. stored in a database). All the vector have the same number of numerical values (e.g. each vector has 100,000 integer values). There is a distance function that tells the distance between two of…
3
votes
0 answers

compare multiple signals using FFT in R

I want to analyse multiple signals using Fast fourier transform and try to group the ones with similar patterns. I'd like to know how to approach this problem. A subset of my data: df <- dput(tst1) structure(list(var_1 = c(0.238942, 0.265, 0.190338,…
user1946217
  • 1,733
  • 6
  • 31
  • 40
3
votes
1 answer

Find number of clusters using distance matrix with hierarchical clustering

How do I determine the optimal number of clusters while using hierarchical clustering. If I am just having the distance matrix as I am measuring only pairwise distances (levenshtein distances), how do I find out the optimal number of clusters? I…
user3570187
  • 1,743
  • 3
  • 17
  • 34
3
votes
1 answer

Extract path from root to leaf in sklearn's agglomerative clustering

Given some specific leaf node of the agglomerative clustering created by sklearn.AgglomerativeClustering, I am trying to identify the path from the root node (all data points) to the given leaf node and for each intermediate step (internal node of…
St123
  • 310
  • 1
  • 9
3
votes
0 answers

AgglomerativeClustering on precomputed Sparse Matrix

In my current approach, I have from scipy.sparse import csr_matrix from sklearn.cluster import AgglomerativeClustering import pandas as pd s = pd.DataFrame([[0.8, 0. , 3. ], [1. , 1. , 2. ], [0.3, 3. , 4. ]], columns=['dist', 'v1',…
3
votes
2 answers

Reordering the high-level clusters from seaborn clustermap results

Is there a way to get from a to b in the following figure with scripting? I am using seaborn.clustermap() to get to a (i.e. the order of the rows are preserved. However, columns order change only at second highest level). I was wondering whether it…
Dataman
  • 3,457
  • 3
  • 19
  • 31
3
votes
1 answer

How to get a list of all leaves under a node in a dendrogram?

I made a dendrogram using scipy.cluster.hierarchy.dendrogram, using the following generated data: a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,]) b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,]) c =…
3
votes
1 answer

seaborn change clustermap visualization options without redoing the clustering

Is it possible to run seaborn.clustermap on a previously obtained ClusterGrid object? For example I user clustermap to obtain g in the following example: import seaborn as ns data = sns.load_dataset("iris") species = iris.pop("species") g =…
lucacerone
  • 9,859
  • 13
  • 52
  • 80