0

If a decision tree splits into 2 classes, how is random forest able to create multiple buckets in classification? Can you post any link about this theory? What is the theory behind it?

Anjali
  • 1
  • 1

1 Answers1

1

A junction in a decision tree doesn't split in two classes. It splits in two subtrees. The outcome of a decision tree is determined by following junctions until you arrive in a leaf node. A simple 2 level tree with 3 binary junctions has 4 leaves

    J
   / \
  J   J
 / \ / \
L  L L  L

With 4 leafs, you can have up to 4 classes, but in general many leaf nodes will belong to the same class.

Of course, with a forest, each tree has many leafs and there are many leafs in the forest, so there are man many leafs across the whole forest.

You can even look at the number of decision trees to decide how reliable an outcome is. If your forest has 100 trees and 3 classes, one input may result in a 90-6-4 distribution and another input may give a 50-30-20 outcome. Both inputs are apparently class 1, but the second input is less certainly so.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Thank you for your response. It does help. Do you have any link where i can read more of this theory as in how it calculates gini index in multi class? – Anjali Jan 31 '18 at 13:46
  • Gini importance? See https://stats.stackexchange.com/questions/92419/relative-importance-of-a-set-of-predictors-in-a-random-forests-classification-in. In general, StackOverflow can help you more with implementation aspects of random forests, but theoretical questions are better asked over on stats.SE (but there too, check first if the question has already been asked) – MSalters Jan 31 '18 at 14:09