Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
5
votes
0 answers

Display huge (about 50k and more) amount of markers on map (with clustering)

I want to show markers on the map in a manner similar to this example, but I have more markers and I want to use "native" Google Map. So far I have found only 1 library that looked like capable of doing this: android-maps-utils. But after testing I…
Display Name
  • 8,022
  • 3
  • 31
  • 66
5
votes
1 answer

What clustering algorithm is suitable for 2d rectangles without knowing the number of clusters ahead of time?

The problem I have is that there are rectangles within rectangles. Think of a map, except with the following traits with the key point being: rectangles with similar density often share similar dimensions and similar position on the x axis with…
javastudent
  • 359
  • 1
  • 4
  • 12
5
votes
3 answers

Extracting Dominant / Most Used Colors from an Image

I would like to extract the most used colors inside an image, or at least the primary tones Could you recommend me how can I start with this task? or point me to a similar code? I have being looking for it but no success.
5
votes
6 answers

how to create a heatmap with a fixed external hierarchical cluster

I have a matrix data, and want to visualize it with heatmap. The rows are species, so I want visualize the phylogenetic tree aside the rows and reorder the rows of the heatmap according the tree. I know the heatmap function in R can create the…
RNA
  • 146,987
  • 15
  • 52
  • 70
5
votes
1 answer

Error with multiscale hierarchical clustering in R

I'm doing hierarchical clustering with an R package called pvclust, which builds on hclust by incorporating bootstrapping to calculate significance levels for the clusters obtained. Consider the following data set with 3 dimensions and 10…
5
votes
1 answer

Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to…
Alos
  • 2,657
  • 5
  • 35
  • 47
4
votes
2 answers

How to persist community information in a graph

I have some graph databases (friends networks, purchasing history, etc.) that I persist with Neo4j. I plan to analyze these with community detection algorithms such as Girvan Newman. These algorithms usually return a dendrogram, representing the…
Paul Jackson
  • 2,077
  • 2
  • 19
  • 29
4
votes
2 answers

Matching up the output of scipy linkage() and dendrogram()

I'm drawing dendrograms from scratch using the Z and P outputs of code like the following (see below for a fuller example): Z = scipy.cluster.hierarchy.linkage(...) P = scipy.cluster.hierarchy.dendrogram(Z, ..., no_plot=True) and in order to do…
nicolaskruchten
  • 26,384
  • 8
  • 83
  • 101
4
votes
1 answer

Grouping/Clustering Rectangles

I have a list of shapes (list of points) e.g. rectangles which I want to group/cluster together. This is what I have: And this is what I want to achieve. How to do it? I already looked at some clustering techniques, e.g., kmeans but it seems there…
dknaack
  • 60,192
  • 27
  • 155
  • 202
4
votes
3 answers

How do I label the terminal nodes of a cut dendrogram?

I used the following code to cut the dendrogram at a particular height.The problem I'm having is that when I cut a dendrogram, I can't figure out how to add labels to the nodes.How can I cut a dendrogram with labels using R…
akash
  • 41
  • 1
  • 3
4
votes
1 answer

Problem with margins using plot function with as.dendrogram object

I'm trying to customize a clustering plot using both base R functions and the package "dendextend". Firstly I generate a cluster with the common hclust() function. Then I'm using "dendextend" to color the branches defined by k=groups. Then I'm using…
4
votes
2 answers

How can i plot a truncated dendrogram plot using plotly?

I want to plot a dendrogram plot for hierarchical clustering using plotly and show a small subset of the plot as with the large number of samples the plot can be very dense at the bottom. I have plotted the plot using the plotly wrapper function…
Albin
  • 61
  • 7
4
votes
0 answers

AgglomerativeClustering setting up distance_threshold

I have a dataset and I want to use AgglomerativeClustering to find clusters. I tried with some sample array, but not able to figure out how to set the distance_threshold. I thought of using this as I'm not aware of the number of clusters for similar…
A3006
  • 1,051
  • 1
  • 11
  • 28
4
votes
1 answer

SwiftUI Large Array (over 30K elements) taking forever to iterate over

I'm so confused as to what's going on but basically, this is my code. Maybe I just am stupid or do not enough above swift or something but I feel like this should take less than a second, but it takes a very long time to iterate through (i added the…
4
votes
2 answers

Hierarchical clusterization heuristics

I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I've decided to use clusterization. I'm interested in finding hierarchical relations between clusters (groups of…