2

I'm trying to perform classification with rpart on dataset with 16 variables and 420 observations (the dataset is a subset of http://archive.ics.uci.edu/ml/datasets/Arrhythmia dataset; I only chose certain variables and excluded missing observations).

The code I'm running is below, the issue is that it seems to be stuck in an infinite loop:

library(rpart)
newdata_frame <- data.frame(newdata)
tree <- rpart(class~ ., data=newdata_frame, method="class")

I'm quite new to rpart, hence I don't have many ideas on how to try to solve this. I tried running "tree" on the same dataset and it performs ok.

Any ideas on why rpart could get stuck in an infinite loop? Thanks for the help! Appreciated! L.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187

1 Answers1

0

The problem may be related to the fact that some of the classes have very few observations (and some have 0, but these are ignored). Since you say tree works fine I assume the slowness is somewhere in the pruning phase of the rpart algorithm. This is the phase where the tree has already been build but the rpart algorithm decides to reduce overfitting by removing some of the partitions (branches).

A quick fix may be to predict whether a subject suffers from any form of Arrhythmia (i.e, class 1 vs the rest).