1

I want to know what variables are important in my decision tree model.

I got the model by using train() of caret package. But the results for attribute usage are strange for fator variables.

Below is my code.

set.seed(123)
ctrl <- trainControl(method = "cv", classProbs = TRUE, summaryFunction=twoClassSummary)
mDt <- train(metS ~ ., data= df_train, method = "C5.0", metric="ROC", trControl = ctrl); mDt

I got the attribute usage by using C5imp(). (The results by using summary(mDt) were the same.)

C5imp(mDt$finalModel)

The attribute usage results are as follows:

  • age 100.00
  • BMI 100.00
  • height 100.00
  • weight 100.00
  • job7 98.90
  • piHeatScore 83.81
  • dailyAlcoholIntake_final 82.96
  • pi4.L 67.14
  • familyIncome^9
  • pi17.C 60.33
  • pi6.C 59.72
  • pi13.L 56.53
  • ...

The strange thing is that one factor variable (e.g. 'pi4': Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<"5") has multiple attribute usages. (e.g. 'pi4.L', 'pi4.Q', 'pi4.C', 'pi^4')

It's similar for unordered factors. For example, 'marriage' is a factor w/ 6 levels ("1","2","3","4","5","6"), and the attribute usages are shown for 'marriage2', 'marriage3', 'marriage4', 'marriage5', and 'marriage6'.

However, the results should be like the following:

(The results below were obtained by using C5.0() with same data. One attribute usage is shown for one factor variable.)

mTemp <- C5.0(df_train[,-1], df_train$metS) 
C5imp(mTemp)
  • BMI 100.00
  • age 32.37
  • pi6 27.28
  • pi13 16.92
  • pi9 15.76
  • job 9.07
  • pi14 2.88
  • ...

I think this is caused by a difference when applying C5.0 method by C5.0() and train().

I want to use train() of caret package, because it automatically applys cross validation etc.

Please help me.

markus
  • 25,843
  • 5
  • 39
  • 58
Amy
  • 83
  • 1
  • 10
  • I found the answer myself :) This problem has been solved when I changed the method of inputting arguments of train() from formula to x and y. https://stats.stackexchange.com/questions/135671/how-does-caret-handle-factors – Amy Jun 04 '19 at 04:14

0 Answers0