I have a large data set which has 100k data fields. When I try str() or view the full data no glitched occurs, but when I run rpart on the training set it takes sometime and after about 3-4 minutes it shows up the following error,
Error: Unable to establish connection with R session
My script looks like below:
# Decision tree
library(rpart)
library(rattle)
library(party)
train_set <- read.table('my_sample_trainset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
test_set <- read.table('my_sample_testset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
my_trained_tree <- rpart(Route ~ Bus_Id + week_days + time_slot, data=train_set, method="class")
# Error occurs on/after this line
my_prediction <- predict(my_trained_tree, test_set, type = "class")
my_solution <- data.frame(Route = my_prediction)
write.csv(my_solution, file = "solution.csv", row.names = FALSE)
Am I missing a library? or does this happen because of the big data set (6.5MB)
Further, I am using rStudio version 0.99.447 on a Mac OS X Yosemite