Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
4
votes
2 answers

Unable to find an inherited method for function ‘species’ for signature ‘"character"’

I'm trying to perform a GSEA analysis following this pipeline: https://learn.gencore.bio.nyu.edu/rna-seq-analysis/gene-set-enrichment-analysis/ But when I run the code: the following message appears: **> Error in (function (classes, fdef, mtable) :…
4
votes
2 answers

retrieve leave colors from scipy dendrogram

I can not get the color leaves from the scipy dendrogram dictionary. As stated in the documentation and in this github issue, the color_list key in the dendrogram dictionary refers to the links, not the leaves. It would be nice to have another key…
4
votes
1 answer

Pass distance matrix to seaborn clustermap

I want to pass my own distance matrix (row linkages) to seaborn clustermap. There are already some posts on this like Use Distance Matrix in scipy.cluster.hierarchy.linkage()? But they all point to scipy hierarchy linkage Which takes the clustering…
Mario L
  • 507
  • 1
  • 6
  • 15
4
votes
0 answers

How do I work with data once it has been clustered?

I've clustered some stocks based on the ward algo and now I want to take the clusters and pair up the stocks inside each cluster I've found ways to write a dictionary containing all the stocks and what color of the dendrogram they are associated…
4
votes
0 answers

How to cluster points based on the function they belong in Python?

sorry, if the title is ambiguous. Let me explain the problem. By the way, I'm really new to Data Science, so sorry if I make a statement that doesn't make sense. Recently came across to a problem which was related to clustering. The coordinates were…
Mansur
  • 1,661
  • 3
  • 17
  • 41
4
votes
1 answer

Recommended Algorithim for time based clustering

I am not very knowledgeable on time based clustering and wondering if any algorithms are well suited for my use case. I have a set of exertion data (range from 0-500) and I want to cluster them along time intervals. My problem is that I want to find…
mornindew
  • 1,993
  • 6
  • 32
  • 54
4
votes
2 answers

How to cluster within clusters

I have a set of points on a map, each with a given parameter value. I would like to: Cluster them spatially and ignore any clusters having fewer than 10 points. My df should have a column (Clust) for the cluster each point belongs to [DONE]…
val
  • 1,629
  • 1
  • 30
  • 56
4
votes
1 answer

Cutting SciPy hierarchical dendrogram into clusters on multiple threshold values

I would like to cut my SciPy's dendrogram into a number of clusters on multiple threshold values. I've tried using fcluster but it can cut only on one threshold value. (Here is a piece of code which I have taken from another question for…
mux032
  • 65
  • 2
  • 8
4
votes
1 answer

r corrplot with clustering: default dissimilarity measure for correlation matrix

I used the R package corrplot to visualize the correlation matrix from my data. I involved the clustering of variables using the embedded option hclust. The invocation of the command was like this (plus various arrangements of titles, axes…
astrsk
  • 375
  • 6
  • 20
4
votes
2 answers

Dendextend: Regarding how to color a dendrogram’s labels according to defined groups

I'm trying to use an awesome R-package named dendextend, to plot a dendrogram and color its branches & labels according to a set of previously defined groups. I've read your answers in Stack Overflow, and the FAQs of dendextend vignette, but I'm…
JLLavin
  • 91
  • 9
4
votes
2 answers

Hierarchical clustering for categorical data in python

I have a categorical attributes that contains string values. three of them contains dayname(mon---sun) monthname and time interval(morning afternoon evening), the other two as i mentioned before has district and street names. followed by gender…
Nhqazi
  • 732
  • 3
  • 12
  • 30
4
votes
3 answers

Algorithmic complexity of group average clustering

I've been reading lately about various hierarchical clustering algorithms such as single-linkage clustering and group average clustering. In general, these algorithms don't tend to scale well. Naive implementations of most hierarchical clustering…
Siler
  • 8,976
  • 11
  • 64
  • 124
4
votes
1 answer

Agglomerative hierarchical clustering technique

Earlier in the year, my AI lecturer taught us about agglomerative hierarchical clustering and K means clustering but his explanations are lost and i'm trying to figure out how he uses the data in the table below to create a dendogram. It would be…
4
votes
1 answer

Leader clustering algorithm explanation

I am trying to understand this algorithm, but not able to get proper documents and explanations. Can someone please help me understand this clustering algorithm.
Rndp13
  • 1,094
  • 1
  • 21
  • 35
4
votes
2 answers

Cutting Dendrogram/Clustering Tree from SciPy at distance height

I'm trying to learn how to use dendrograms in Python using SciPy . I want to get clusters and be able to visualize them; I heard hierarchical clustering and dendrograms are the best way. How can I "cut" the tree at a specific distance? In this…
O.rka
  • 29,847
  • 68
  • 194
  • 309