Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
6
votes
2 answers

Clustering and distance calculation in Julia

I have a collection of n coordinate points of the form (x,y,z). These are stored in an n x 3 matrix M. Is there a built in function in Julia to calculate the distance between each point and every other point? I'm working with a small number of…
lara
  • 835
  • 1
  • 8
  • 20
6
votes
2 answers

How to calculate clustering entropy? A working example or software code

I would like to calculate entropy of this example scheme http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html Can anybody please explain step by step with real values? I know there are unliminted number of formulas but i…
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
6
votes
1 answer

sklearn agglomerative clustering input data

I have a similarity matrix between four users. I want to do an agglomerative clustering. the code is like this: lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1') X = np.reshape(lena, (-1, 1)) print("Compute structured hierarchical…
printemp
  • 869
  • 1
  • 10
  • 33
6
votes
2 answers

Installing the Kmeans PostgreSQL extension on Amazon RDS

I take part in some Django poroject and we use geo data (with GeoDjango). I have installed PostGis as it described on AWS docs. We have a lot of some points (markers) on the map. And we need to cluster them. I found one library anycluster. This…
6
votes
2 answers

Corner Detection in 2D Vector Data

I am trying to detect corners (x/y coordinates) in 2D scatter vectors of data. The data is from a laser rangefinder and our current platform uses Matlab (though standalone programs/libs are an option, but the Nav/Control code is on Matlab so it must…
6
votes
1 answer

hierarchical classification + topic model training data for internet articles and social media

I want to classify large numbers (100K to 1M+) of smallish internet-based articles (tweets, blog articles, news, etc) by topic. Toward this goal, I have been looking for labeled training data documents which I could use to build classifier…
6
votes
1 answer

Cutting dendrogram into n trees with minimum cluster size in R

I'm trying to use hirearchical clustering (specifically hclust) to cluster a data set into 10 groups with sizes of 100 members or fewer, and with no group having more than 40% of the total population. The only method I currently know is to…
Bryan
  • 5,999
  • 9
  • 29
  • 50
6
votes
4 answers

Best way to test a clustering algorithm

What is the best way to test a clustering algorithm? I am using an agglomerative clustering algorithm with a stop criterion. How do I test if the clusters are formed correctly or not?
London guy
  • 27,522
  • 44
  • 121
  • 179
5
votes
1 answer

Extract rows of clusters in hierarchical clustering using seaborn clustermap

I am using hierarchical clustering from seaborn.clustermap to cluster my data. This works fine to nicely visualize the clusters in a heatmap. However, now I would like to extract all row values that are assigned to the different clusters. This is…
pr94
  • 1,263
  • 12
  • 24
5
votes
1 answer

memory error during hierarchical clustering Python 3.6

I have a fairly large data set (1841000*32 matrix) I wish to run a hierarchical clustering algorithm on. Both the AgglomerativeClustering class and the FeatureAgglomeration class in sklearn.cluster give the below error. …
5
votes
1 answer

How to convert a dendrogram to a tree object in python?

I'm trying to use the scipy.hierarchy.cluster module to hierarchically cluster some text. I've done the following: l = linkage(model.wv.syn0, method='complete', metric='cosine') den = dendrogram( l, leaf_rotation=0., leaf_font_size=16., …
5
votes
1 answer

How to line (cut) a dendrogram at the best K

How do I draw a line in a dendrogram that corresponds the best K for a given criteria? Like this: Lets suppose that this is my dendrogram, and the best K is 4. data("mtcars") myDend <- as.dendrogram(hclust(dist(mtcars))) plot(myDend) I know that…
Gilgamesh
  • 589
  • 1
  • 6
  • 20
5
votes
1 answer

Swap leafs of Python scipy's dendrogram/linkage

I generated a dendrogram plot for my dataset and I am not happy how the splits at some levels have been ordered. I am thus looking for a way to swap the two branches (or leaves) of a single split. If we look at the code and dendrogram plot at the…
dmeu
  • 3,842
  • 5
  • 27
  • 43
5
votes
1 answer

How can I get clusters from distance matrix, using PHP?

I have distance matrix as two-dimensional array, like this: So, I need to find clusters, of elements with its help. I can do it, using hierarchic clusterization, like k-means. I have found such example here PHP K-Means How can I convert my…
Bogdan Lashkov
  • 319
  • 3
  • 17
5
votes
1 answer

Alternative to scipy.cluster.hierarchy.cut_tree()

I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree() is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the…
PDRX
  • 1,003
  • 1
  • 11
  • 15