Questions tagged [hierarchical-clustering]

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters. Hierarchical clustering provides advantages to analysts with its visualization potential.

Hierarchical clustering is a clustering technique that generates clusters at multiple hierarchical levels, thereby generating a tree of clusters.

Examples

Common methods include DIANA (DIvisive ANAlysis) which performs top down clustering (usually starts from the entire data set and then divides it till eventually a point is reached where each data point resides in a single cluster, or reaches a user-defined condition).

Another widely known method is AGNES (AGlomerative NESting) which basically performs the opposite of DIANA.

Distance metric& some advantages

There are multitude of ways to compute the distance metric upon which the clustering techniques divide/accumulate in to new clusters (as complete and single link distances which basically compute maximum and minimum respectively).

Hierarchical clustering provides advantages to analysts with its visualization potential, given its output of the hierarchical classification of a dataset. Such trees (hierarchies) could be utilized in a myriad of ways.

Other non-hierarchical clustering techniques

Other clustering methodologies include, but are not limited to, partitioning techniques (as k means and PAM) and density based techniques (as DBSCAN) known for its advantageous discovery of unusual cluster shapes (as non-circular shapes).

Suggested learning sources to look into

  • Han, Kamber and Pei's Data Mining book; whose lecture slides and companion material could be found here.
  • Wikipedia has an entry on the topic here.
1187 questions
-1
votes
3 answers

Decision Tree, What is Wrong here?

I took a contest two days ago. one of our question is as follows: decision tree with depth 2 is constructed for two binary feature. how many features are in hypothesis space that can be shown with the following tree ? The answer sheet say…
-1
votes
1 answer

Dendrogram: figue.js (clustering) output to fit visualization

It seems that machine learning via JavaScript is in it's infancy as there are hardly any libraries which fit each other calculation wise and visualization wise. I am using figue.js library and wish to output the result via the…
basickarl
  • 37,187
  • 64
  • 214
  • 335
-1
votes
1 answer

Efficient algorithm for dendrogram cutoff

I have implemented an algorithm for hierarchical clustering and a simple method for drawing the dendrogram in C#. Now I want to add dendrogram cutoff method and another one for coloring dendrogram branches. What would be an efficient algorithm to do…
Sebastian Widz
  • 1,962
  • 4
  • 26
  • 45
-1
votes
1 answer

Determining if algorithm is hierarchical or density allied

I'm trying to cluster points in my dataset. The simple steps are as follows: Find the nearest neighbor for each point. Eliminate noise points by setting a threshold for nearest neighbor parameter (those points with large enough nearest neighbor…
user3482970
  • 65
  • 2
  • 5
-1
votes
1 answer

How to crawl news websites (content only)?

I want to crawl Indian news websites and their archives (eg. thehindu.com, indianexpress.com and timesofindia.com). I have heard of boilerplate library in Java used to extract content. But is there any library in python to do this and how t do…
mridul
  • 105
  • 2
  • 6
-1
votes
1 answer

How to generate a labelled dendogram using agnes?

Using the code from : http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Hierarchical_Clustering Here is how to generate a dendogram : # import data x <- read.table("data.txt") # run AGNES ag <- agnes (x, false,…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
-1
votes
2 answers

Clustering - how to find the nearest to a cluster

Hints I got as to a different question puzzled me quite a bit. I got an exercise, actually part of a larger exercise: Cluster some data, using hclust (done) Given a totally new vector, find out to which of the clusters you got in 1 it is…
newnewbie
  • 993
  • 2
  • 11
  • 26
-1
votes
2 answers

labeling a dendrogram after using hclust

Suppose you cluster a Matrix that has a headerline in R, using hclust. Usually one would get a labeled picture, so to speak, a dendrogram. Is there a way to make the labels of the vectors (which are in the headerline) appear within the dendrogramm?
newnewbie
  • 993
  • 2
  • 11
  • 26
-1
votes
1 answer

How to generate a plot of residuals versus predictor variable for a mixed model?

My mixed model is as follows: model <- lme(Cost~1+Units, random=~1+Units|Factory, method="ML", data=A) I was told to apply the code below to plot residuals versus fitted values and it worked: plot(fitted(model), resid(model)) However, it showed me…
Guess Gucci
  • 253
  • 1
  • 3
  • 11
-2
votes
2 answers

Python how to get the max number in each cluster

I'm working on k-mean algorthim to cluster list of number, let we say my list is my_list = [13, 15, 13, 23, 45, 25, 7] how could I use k-mean to grouped into clusters of similar numbers? So the output would be this: clusters = { 1 : [7], 2…
man.utd_21
  • 19
  • 7
-2
votes
1 answer

Topology change: star to hierachical?

I'm not familiar with omnet and adhoc network. I have investigated many papers, theses, books and tutorials (ex: Tictoc, inet framework..). Therefore, I worry that this work has been for a long time. Now I have to simulate the different topology…
-2
votes
1 answer

Clustering GPS data into "k" Groups

I have a list of GPS(longitude and latitude pairs) data(~3000) and I would like to split them into "k" groups based based on their distance(geodesic and/or euclidean). What's the best way to do this?
-2
votes
3 answers

Clustering a set of countries based on cultural similarity on R

I am having some problems trying to cluster countries using a sort of cultural correlation that I already have. basically, the dataset looks like this: with 90 countries, 91 columns (90 country columns + one to identify the nations on the rows) and…
-2
votes
1 answer

calinski-hrabasz f index for hierarchical clustering

How do I compute the calinski-hrabasz f index in R for hierarchical clustering? I need it to determine the optimal amounts of clusters.
-2
votes
2 answers

Hierarchical Clustering and the probability that belonging a cluster - Weka

I have bunch of data.I am clustering this data with hierarchical clustering algorithm. How can i find the probability that an instance belongs to a cluster. When I found probability, I will make some calculations.Then, I will add this instance my…