I would like to get the names of the features being used in my pruned RPART tree. I can read the names off the plotted pruned tree, but I prefer a vector of names. Is there a way to do this?
Asked
Active
Viewed 241 times
1 Answers
-1
It takes a bit of effort, but you can get this. Since you do not provide any example, I will illustrate with some built-in data.
library(rpart)
libary(partykit) ## for nicer plotting
IRP = rpart(Species ~ ., data=iris)
plot(as.party(IRP))
One way to get at the variables used for the splits
is through the function labels
.
labels(IRP)
[1] "root" "Petal.Length< 2.45" "Petal.Length>=2.45"
[4] "Petal.Width< 1.75" "Petal.Width>=1.75"
It is easy to ignore the first (root) node, but we need
to clean up the text for the other splits. We can use sub
and a regular expression to get just the variable names.
VPat = paste0(".*(", paste(colnames(iris), collapse="|"), ").*")
sub(VPat,"\\1", labels(IRP)[-1])
[1] "Petal.Length" "Petal.Length" "Petal.Width" "Petal.Width"
If you want, you can apply unique
to this to just get each
variable name once.

G5W
- 36,531
- 10
- 47
- 80