2

The original data set is 7499 obs. of 19 variables. I'm using tree package in R to build up a classification tree. The result seems reasonable and the plot succeeded to show below:

library(tree)
tree.data = tree(Y~., data.train, control = tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)

Tree

However, when I try to use cv.tree to prune the tree, there exits error.

cv.data = cv.tree(tree.data, FUN = prune.misclass)
Error in prune.tree(tree = list(frame = list(var = 1L, n = 6732, dev = 9089.97487458261,  : 
  can not prune singlenode tree

Then I check the tree.data structure.

summary(tree.data)

Classification tree:
tree(formula = Y ~ ., data = data.train, control = tree.control(dim(data)[1], 
    mincut = 10, minsize = 20, mindev = 0.001))
Variables actually used in tree construction:
 [1] "X2"  "X1"  "X6"  "X13" "X5"  "X10" "X14" "X16" "X17" "X3"  "X7"  "X15" "X11" "X18"
[15] "X8"  "X12"
Number of terminal nodes:  45 
Residual mean deviance:  1.24 = 9243 / 7454 
Misclassification error rate: 0.3475 = 2606 / 7499 

This is not a single-node tree. So I'm confused why this error will appear?

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
Skye
  • 25
  • 4

1 Answers1

1

This error is generated by cv.tree when the tree is completely pruned and only the root node remains. I can reproduce your error when generating a set of X variables not associate to Y.

library(tree)

# Data generating process
# Y is NOT associated to any X variables
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- rbinom(7499, 1, 0.5)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]   
# Train the tree
tree.data = tree(Y~., data.train, 
            control=tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)    
# Pruning by cv.tree 
cv.data = cv.tree(tree.data, FUN = prune.misclass)

And the error message is:

Error in prune.tree(tree = list(frame = list(var = 1L, n = 4842, dev = 6712.03745626047, : can not prune singlenode tree

Suppose now that X1 is associated to Y.

# Data generating process
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- X[,1]>0 + rbinom(7499, 1, 0.2)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]

the cv.tree command now does not throw errors:

# Pruning by cv.tree
cv.data = cv.tree(tree.data, FUN = prune.misclass)
pruned.tree <- prune.tree(tree.data, k=cv.data$k[3])
plot(pruned.tree)
text(pruned.tree, pretty = 0, cex=0.6)

enter image description here

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
  • Thank you so much !!!! I have been searching for days to find out why. Your answer completely solves my problem. – Skye Nov 25 '21 at 17:05