-2

I would like to get the names of the features being used in my pruned RPART tree. I can read the names off the plotted pruned tree, but I prefer a vector of names. Is there a way to do this?

Neal Oden
  • 57
  • 4

1 Answers1

-1

It takes a bit of effort, but you can get this. Since you do not provide any example, I will illustrate with some built-in data.

library(rpart)
libary(partykit)        ## for nicer plotting
IRP = rpart(Species ~ ., data=iris)
plot(as.party(IRP))

iris tree

One way to get at the variables used for the splits is through the function labels.

labels(IRP)
[1] "root"               "Petal.Length< 2.45" "Petal.Length>=2.45"
[4] "Petal.Width< 1.75"  "Petal.Width>=1.75"

It is easy to ignore the first (root) node, but we need to clean up the text for the other splits. We can use sub and a regular expression to get just the variable names.

VPat = paste0(".*(", paste(colnames(iris), collapse="|"), ").*")
sub(VPat,"\\1", labels(IRP)[-1])
[1] "Petal.Length" "Petal.Length" "Petal.Width"  "Petal.Width" 

If you want, you can apply unique to this to just get each variable name once.

G5W
  • 36,531
  • 10
  • 47
  • 80