I would like to use matplotlib to draw a dendrogram without using scipy. A similar question has been posted here; however, the marked solution suggests using scipy and the links in the other answers suggesting using ETE do not work. Using this example, I have verified the accuracy of my own method (ie, not scipy method) to apply agglomerative hierarchical clustering using the single-linkage criterion.
Using the same example linked from above, I have the necessary parameters to create my own dendrogram. The original distance_matrix
is given by:
.. DISTANCE MATRIX (SHAPE=(6, 6)):
[[ 0 662 877 255 412 996]
[662 0 295 468 268 400]
[877 295 0 754 564 0]
[255 468 754 0 219 869]
[412 268 564 219 0 669]
[996 400 0 869 669 0]]
A masked array of distance_matrix
is used such that the diagonal entries from above are not counted as minimums. The mask of the original distance_matrix
is given by:
.. MASKED (BEFORE) DISTANCE MATRIX (SHAPE=(6, 6)):
[[-- 662 877 255 412 996]
[662 -- 295 468 268 400]
[877 295 -- 754 564 0]
[255 468 754 -- 219 869]
[412 268 564 219 -- 669]
[996 400 0 869 669 --]]
distance_matrix
is changed in-place at every iteration of the algorithm. Once the algorithm has completed, distance_matrix
is given by:
.. MASKED (AFTER) DISTANCE MATRIX (SHAPE=(1, 1)):
[[--]]
The levels (minimum distance of each merger) are give by:
.. 5 LEVELS:
[138, 219, 255, 268, 295]
We can also view the indices of the merged datapoints at every iteration; these indices correspond to the original distance_matrix
since reducing dimensions has the effect of changing index positions. These indices are given by:
.. 5x2 LOCATIONS:
[(2, 5), (3, 4), (0, 3), (0, 1), (0, 2)]
From these indices, the ordering of the xticklabels of the dendrogram are given chronologically as:
.. 6 XTICKLABELS
[2 5 3 4 0 1]
In relation to the linked example,
0 = BA
1 = FI
2 = MI
3 = NA
4 = RM
5 = TO
Using these parameters, I would like to generate a dendrogram that looks like the one below (borrowed from linked example):
My attempt at trying to replicate this dendrogram using matplotlib is below:
fig, ax = plt.subplots()
for loc, level in zip(locations, levels):
x = np.array(loc)
y = level * np.ones(x.size)
ax.step(x, y, where='mid')
ax.set_xticks(xticklabels)
# ax.set_xticklabels(xticklabels)
plt.show()
plt.close(fig)
My attempt above produces the following figure:
I realize I have to reorder the xticklabels
such that the first merged points appear at the right-edge, with each subsequent merger shifting towards the left; doing so necessarily means adjusting the width of the connecting lines. Also, I was using ax.step
instead of ax.bar
so that the lines would appear more organized (as opposed to rectangular bars everywhere); the only thing I can think to do is to draw horizontal and vertical lines using ax.axhline
and ax.axvline
. I am hoping there is a simpler way to accomplish what I would like. Is there a straight-forward approach using matplotlib?