1

I have an issue while trying to create a decision tree through rpart, it is taking too much time to complete.

I am not sure if I need to reduce dimensionality or factors in any feature of the given dataset.

Below you will find a head and str from the dataset. Also this is the link to it.

   Funct.Area Environment ServiceType Ticket.Nature SLA.Result..4P. IRIS.Priority
2         FUN         DCF         FUN            SR              OK    Priority 3
5         APS         DCF         APS            SR          Defect    Priority 3
7         SEC         DCF         SEC            SR              OK    Priority 4
8         SEC         DCF         SEC            SR          Defect    Priority 4
9         FUN         DCF         FUN            SR              OK    Priority 3
10        SEC         DCF         SEC            SR              OK    Priority 3

'data.frame':   69250 obs. of  6 variables:
 $ Funct.Area     : Factor w/ 27 levels "0","812","APS",..: 13 3 26 26 13 26 26 26 26 26 ...
 $ Environment    : Factor w/ 29 levels " WS","812","BULK",..: 9 9 9 9 9 9 9 9 9 9 ...
 $ ServiceType    : Factor w/ 21 levels "APS","BULK","CNC",..: 8 1 18 18 8 18 18 18 18 18 ...
 $ Ticket.Nature  : Factor w/ 5 levels "BULK","CHG","HK",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ SLA.Result..4P.: Factor w/ 5 levels "#¡REF!","#N/A",..: 5 3 5 3 5 5 5 5 5 5 ...
 $ IRIS.Priority  : Factor w/ 4 levels "Priority 1","Priority 2",..: 3 3 4 4 3 3 3 3 4 4 ...

My understanding is that rpart package can handle categorical variables until 32 different factors.

Is there any way to reduce processing time?

Here is the link of the R script

nariver1
  • 353
  • 3
  • 19
  • 1
    for factor variables all 2^(k−1) − 1 posible splits (k = number of levels) are tested. 6 factors * many levels * 70k observations == "very slow". Try to collapse some levels. Convert some factors to ordered if it makes sense and so on. Or use some other method. – missuse Mar 12 '18 at 14:26
  • https://stackoverflow.com/questions/17195021/computational-time-for-categorical-vs-continuous-regressors i think this might answer your question – stas g Mar 12 '18 at 15:17
  • Hi, I've tried with this, as suggested by stats g and seems to be working. I should read more regarding the parameters of rpart function. Any link will be useful. Thanks. – nariver1 Mar 14 '18 at 16:09

0 Answers0