I have an issue while trying to create a decision tree through rpart
, it is taking too much time to complete.
I am not sure if I need to reduce dimensionality or factors in any feature of the given dataset.
Below you will find a head
and str
from the dataset. Also this is the link to it.
Funct.Area Environment ServiceType Ticket.Nature SLA.Result..4P. IRIS.Priority
2 FUN DCF FUN SR OK Priority 3
5 APS DCF APS SR Defect Priority 3
7 SEC DCF SEC SR OK Priority 4
8 SEC DCF SEC SR Defect Priority 4
9 FUN DCF FUN SR OK Priority 3
10 SEC DCF SEC SR OK Priority 3
'data.frame': 69250 obs. of 6 variables:
$ Funct.Area : Factor w/ 27 levels "0","812","APS",..: 13 3 26 26 13 26 26 26 26 26 ...
$ Environment : Factor w/ 29 levels " WS","812","BULK",..: 9 9 9 9 9 9 9 9 9 9 ...
$ ServiceType : Factor w/ 21 levels "APS","BULK","CNC",..: 8 1 18 18 8 18 18 18 18 18 ...
$ Ticket.Nature : Factor w/ 5 levels "BULK","CHG","HK",..: 5 5 5 5 5 5 5 5 5 5 ...
$ SLA.Result..4P.: Factor w/ 5 levels "#¡REF!","#N/A",..: 5 3 5 3 5 5 5 5 5 5 ...
$ IRIS.Priority : Factor w/ 4 levels "Priority 1","Priority 2",..: 3 3 4 4 3 3 3 3 4 4 ...
My understanding is that rpart package can handle categorical variables until 32 different factors.
Is there any way to reduce processing time?
Here is the link of the R script