Hierarchical classification top down approach to machine learning

Question

I have a dataset of sentences that have been annotated with labels from a hierarchy. The hierarchy is a selection of music genres. It is a tree, not a DAG - each node has one parent and one parent only. Here is an extract as an example:

root = music
     parent = latin
            child = afro-cuban
                    child = salsa
            child = brazilian
                    child = axe
     parent = non-latin
            child = classical
     ...

For the sentence Mozart is the best for example, from the collected annotations, the majority agree the class label for this sentence or ground truth is classical. From the hierarchy, we know that classical is also a form of non-latin music, which is a form of music. Whereas i prefer salsa might have been annotated as latin.

In terms of classification, flattening the hierarchy - which I have done - intuitively does not solve the problem, as we're completely ignoring the class hierarchy. It also produces low results whilst using Weka, and a selection of classifiers, as we're faced with a multiclass classification problem.

My problem is, I've read very vague literature and online articles about how hierarchical classification is implemented. I'd like to use Weka and Python. But I just wanted clarification of how to perform hierarchical classification in this situation. So my questions are:

1) what is the best suggestion of going around this? Would implementing a top-down approach be the best option? If I do this, how do I avoid the problem of classifying incorrectly on each level? i.e. it could predict latin on level 1, and classical on level 2. What about a binary classifier? I'm open to suggestions.

2) how does training and testing data come into this?

3) how can one evaluate classification performance? Particularly with a top-down approach, as we'll have evaluations for every separate level.

Doesn't it suffice to predict level3? Because, if you know that something is "`classical`", then it implicitly holds that level2= `non-latin`, and level1 = `music`. — knb, Apr 11 '18 at 11:15

score 2 · Answer 1 · answered Feb 03 '17 at 18:44

This survey article does a nice job of explaining the various strategies for hierarchical classification.

You can prevent inconsistent predictions, like the latin->classical example you gave, by controlling the training data used to train each of the subclassifers. For example, you first train a binary classifier to distinguish between latin and non-latin, using all the data for training. You then train a classifier to distinguish between afro-cuban and brazilian using only examples from those two classes as the training data. At inference time, you only pass an unlabeled example into the afro-cuban/barzilian classifier if the latin/non-latin classifier predicts 'latin'.

score 0 · Answer 2 · answered Apr 24 '16 at 17:43

0

I'm not sure I understand completely your problem, but from what I did understand, it sounds that Decision tree, or a most advanced algorithm like Random Forest would be a good choice. You will need to build the tree, maybe use some NLP techniques for removing unnecessary words like "is", "I", "the" (probably, but need to check that deeper) and use the words as features to the tree.

as for the second question, you should probably read some about machine learning. Andrew Ng course on Coursera is a good choice to start with. But for your question, training is the part of the data which you choose to train on, and the test data is what you evaluate your algorithm performance. This should also answer your third question

answered Apr 24 '16 at 17:43

lazary

449
1
8
17

thanks for you answer. I'm aware of the purpose of training and testing data, and what they do, and machine learning in general. I just dont understand how you'd go around doing it in a hierarchical way. So you suggest extracting features from my sentences? This only produces the sentence in a smaller space. How does this resolve my hierarchical issue? The problem is, that a class belongs to multiple levels. As I explained, a node can also be classified as its parent, because conceptually, that'd also be correct. How does one do this? – user47467 Apr 24 '16 at 17:58
Extracting features from sentences - well yes, first of all this will remove you a lot of unnecessary data, second in sentences as you gave in your example it will keep just the meaningful words. The hierarchical issue will be solved in trees, and it is also quite intuitive. For example, in the sentence: _"i prefer salsa"_, the salsa will be a strong feature, and as it enter your classifier, it will be probably indicate a strong signal to afro-cuban or latin or music, depends what are your labels In what part exactly of the tree are you interested? top level? second one? – lazary Apr 24 '16 at 18:09
I have no prior knowledge to trees, so if you know or could help me out with how you'd do this weka, that'd be great. I'm interested in the entire tree, I guess this depends on the class label the sentence has? – user47467 Apr 24 '16 at 18:21
I found [this](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/) intro once, it contains a part on trees and is very visualized. I really like it. Hope it helps – lazary Apr 24 '16 at 18:26
I'm not sure decision trees are what I'm looking for. Previous papers I have read are classifying on each node as binary classification, or on each hierarchical level. There's no clarification as to how this is done, or tutorials online, unless you know of any? That'd be great. – user47467 Apr 24 '16 at 18:39

score 0 · Answer 3 · answered Apr 11 '18 at 05:13

hierarchically organizing the classes, creating a tree or DAG (Directed Acyclic Graph) of categories, exploiting the information on relationships among them.

we take what is called a top-down approach, training a classifier per level (or node) of the tree (again, although this is not the only hierarchical approach, it is definitely the most widely used and the one we’ve selected for our problem at hands), where a given decision will lead us down a different classification path.

MUSIC EXAMPLE From the blog spot linked below : we start first by training a classifier to predict, let’s say, the genre of the music (Death Metal) and then, we train another classifier to predict, for example, the nationality of the band (Swedish), and then we can have a classifier trained on predicting existing bands within that sub-group (Arch Enemy, At the Gates, …)

checkout this post on Hierarchical classification for more detailed info.

https://www.kdnuggets.com/2018/03/hierarchical-classification.html

Hierarchical classification top down approach to machine learning

3 Answers3