2

I have a large data set which has 100k data fields. When I try str() or view the full data no glitched occurs, but when I run rpart on the training set it takes sometime and after about 3-4 minutes it shows up the following error,

Error: Unable to establish connection with R session

My script looks like below:

# Decision tree
library(rpart)                      
library(rattle)                                 
library(party)  

train_set <- read.table('my_sample_trainset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
test_set <- read.table('my_sample_testset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)

my_trained_tree <- rpart(Route ~ Bus_Id + week_days + time_slot, data=train_set, method="class")
# Error occurs on/after this line

my_prediction <- predict(my_trained_tree, test_set, type = "class")

my_solution <- data.frame(Route = my_prediction)

write.csv(my_solution, file = "solution.csv", row.names = FALSE)

Am I missing a library? or does this happen because of the big data set (6.5MB)

Further, I am using rStudio version 0.99.447 on a Mac OS X Yosemite

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Dinal24
  • 3,162
  • 2
  • 18
  • 32

1 Answers1

1

That message means that R is still calculating the results. If you open Activity Monitor and sort by CPU usage on the CPU tab, you should see that rsession is using 100% of a CPU. So you can just click "ok" on that message and allow R to keep computing.

I wish there were a workaround though, this issue is plaguing me as we speak!

Chris Kennedy
  • 339
  • 4
  • 8