3

I am running into some labels issue when using rpart in R.

Here's my situation.

I'm working on a dataset with categorical variables, here's an extract of my data

head(Dataset)
Entity  IL  CP  TD  Budget 
  2      1   3   2     250
  5      2   2   1     663
  6      1   2   3     526 
  2      3   1   2     522

when I plot my decision tree adding the labels, using

plot(tree) 
text(tree)

I get wrong labels : for Entity, I get "abcd"

Why do I get that and how can I fix that ?

Thank you for your help

Layale
  • 153
  • 1
  • 9

1 Answers1

5

By default plot.rpart will just label the levels of factor variables with letters, the first level will be a, second b and so on. Example:

library(rpart)
library(ggplot2) #for the data

data("diamonds")    
df <- diamonds[1:2000,]

fit <- rpart(price ~ color + cut + clarity, data = df)
plot(fit)
text(fit)

enter image description here

In my opinion instead of customizing this plot use the rpart plotting dedicated package:

library(rpart.plot)
prp(fit)

enter image description here

it has many customization options (example):

prp(fit,
    type = 4,
    extra = 101,
    fallen.leaves = T,
    box.palette = colorRampPalette(c("red", "white", "green3"))(10),
    round = 2,
    branch.lty = 2,
    branch.lwd = 1,
    space = -1,
    varlen = 0,
    faclen = 0)

enter image description here

Another options is:

library(rattle)
fancyRpartPlot(fit,
               type = 4)

enter image description here

which uses prp internally with different defaults.

missuse
  • 19,056
  • 3
  • 25
  • 47