-1

I am learning Decision Tree method for machine learning. Right now, the most important piece of code I use is c5. 0. Got to admit, it is a genius' work. But i couldn't understand how it chooses the root and decision nodes. Example: I have a database named 'credit'. here is first few columns:

 str(credit)
    'data.frame':   1000 obs. of  21 variables:
 $ checking_balance    : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 4 4 3 4 3 ...
 $ months_loan_duration: int  6 48 12 42 24 36 24 36 12 30 ...
 $ credit_history      : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 2 5 5 5 5 1 ...
 $ purpose             : Factor w/ 10 levels "business","car (new)",..: 8 8 5 6 2 5 6 3 8 2 ...
 $ amount              : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...

So when i look at the decision tree after having applied c5.0, i see that the root node is $cheking balance, then the next decision node is $credit_history. What is the strategy or the trajectory c5.0 follows when creating a decision tree? In other words, how does it determine the order of decision nodes?

HydraCc
  • 41
  • 5

1 Answers1

0

There are many resources available that explain the c5.0 algorithm and how it can be applied, e.g. https://hub.packtpub.com/brett-lantz-on-implementing-a-decision-tree-using-c5-0-algorithm-in-r/ ; http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/08_DecisionTreeClass.html ; and, in my opinion the best resource, Quinlan, J., 2014. C4. 5: programs for machine learning. Elsevier. If you search you will find your answer.

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46