0

very simple data frame:

     TYPE USERS  VISITS SIZE
1   no       3     5 118266
2   no       3     5 118548
3   yes      1     0 274558
4   no       3    10  86078
5   yes      3     4 355091
7   yes      18     0  29915
8   yes      6     0 278590
9   yes      5     0 477850
10  yes      1     2  67751
11  yes      4     9 309361

When getting a ctree classification for TYPE variable:

plot(ctree(TYPE ~ ., data = df))

Seems to appear 3 labels but can't know for sure because labels are not written at the end below the plot.

ctree plot

Why 3 end states if I just have two (yes, no)? and why labels are not present?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
useRj
  • 1,232
  • 1
  • 9
  • 15
  • 1
    Instead of adding tags like `rstudio` (which add nothing to the problem), it is better to specify the packages you are using in order to help people reproduce your problem. Regarding your problem, you are probably running this on a subset and you have empty levels there, see `?droplevels`, Hence the tree performed an "ordinal regression" analysis instead of a "classification" one, because it thought you don't have a binary variable. Simply by running `?ctree` and checking the examples there, you should be able to understand the difference. – David Arenburg May 11 '16 at 09:34
  • Thanks for the tip. Double checked: there is no empty levels. All rows are yes-no type – useRj May 11 '16 at 10:13
  • 1
    How did you check that there are no empty levels? What `levels(df$TYPE)` gives you? When running this on your data set above I'm getting a different tree. – David Arenburg May 11 '16 at 10:16
  • > levels(df$TYPE) [1] "no" "yes" – useRj May 11 '16 at 10:32
  • 1
    And you get that tree on the data set you've posted in comments? Maybe restart R. Something seem to be wrong with your session. – David Arenburg May 11 '16 at 10:33

1 Answers1

0

As already pointed out by @DavidArenburg the data df you used for growing the tree almost surely had a TYPE variable with three levels although only two of these actually occurred in the observed data. See below for a reproducible example based on the print-out you provided.

As for the problem that the levels are not visible in the plot: The reason is that you used a plotting window that is too small for the default font size. Hence, overplotting labels are not shown. The easiest solution for this is to simply increase the size of the plotting window. Alternatively, you can decrease the font size. See below for an example.

Read the data:

df <- read.table(textConnection("     TYPE USERS  VISITS SIZE
1   no       3     5 118266
2   no       3     5 118548
3   yes      1     0 274558
4   no       3    10  86078
5   yes      3     4 355091
7   yes      18     0  29915
8   yes      6     0 278590
9   yes      5     0 477850
10  yes      1     2  67751
11  yes      4     9 309361
"))

And then grow and visualize the tree:

library("partykit")
ct <- ctree(TYPE ~ ., data = df)
plot(ct)

ctree-default

As you see a ctree with a binary response is displayed where stacked bars are used. To obtain bars plotted side by side you need to modify the arguments for the terminal panel function accordingly:

plot(ct, tp_args = list(beside = TRUE))

ctree-beside

And finally to change the size of the labels the grid graphical parameters can be altered. (Note that this necessitates the partykit rather than the party implementation of ctree().)

plot(ct, tp_args = list(beside = TRUE), gp = gpar(fontsize = 33))

ctree-fontsize

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49