1

I have the following decision tree (created by JWEKA package - by the command J48(NSP~., data=training) ):

[[1]]                                                               
J48 pruned  tree                                                        
------------------                                                              

MSTV    <=  0.4                                                     
|   MLTV    <=  4.1:    3   -2                                          
|   MLTV    >   4.1                                                 
|   |   ASTV    <=  79                                              
|   |   |   b   <=  1383:00:00  2   -18                                 
|   |   |   b   >   1383                                            
|   |   |   |   UC  <=  05:00   1   -2                              
|   |   |   |   UC  >   05:00   2   -2                              
|   |   ASTV    >   79:00:00    3   -2                                      
MSTV    >   0.4                                                     
|   DP  <=  0                                                   
|   |   ALTV    <=  09:00   1   (170.0/2.0)                                     
|   |   ALTV    >   9                                               
|   |   |   FM  <=  7                                           
|   |   |   |   LBE <=  142:00:00   1   (27.0/1.0)                              
|   |   |   |   LBE >   142                                     
|   |   |   |   |   AC  <=  2                                   
|   |   |   |   |   |   e   <=  1058:00:00  1   -5                      
|   |   |   |   |   |   e   >   1058                                
|   |   |   |   |   |   |   DL  <=  04:00   2   (9.0/1.0)                   
|   |   |   |   |   |   |   DL  >   04:00   1   -2                  
|   |   |   |   |   AC  >   02:00   1   -3                          
|   |   |   FM  >   07:00   2   -2                                  
|   DP  >   0                                                   
|   |   DP  <=  1                                               
|   |   |   UC  <=  03:00   2   (4.0/1.0)                                   
|   |   |   UC  >   3                                           
|   |   |   |   MLTV    <=  0.4:    3   -2                              
|   |   |   |   MLTV    >   0.4:    1   -8                              
|   |   DP  >   01:00   3   -8                                      

Number  of  Leaves  :   16                                              

Size    of  the tree    :   31

I would like to extract the nodes' values in 2 formats: one format only the name of the property such as: MSTV, MLTV, DP... etc., So each level of the tree will be followed by his parent, in the above case I would like to get the '(' as separator between each level such as:

(MSTV (MLTV...) (DP...) )

In the second format I would like to get the nodes with their values such as:

(MSTV 0.4 (MLTV 4.1 ....) (DP 0..... ) )

How can I extract the relevant information. I think to separate between the node values we should separate the characters by using gsub("[A-Z]:", "", string) But we need to ignore the last lines. Thanks a lot for your help.

Avi
  • 2,247
  • 4
  • 30
  • 52
  • I think the first approach to solve it is to divide the tree into columns. Each column includes the required property for the relevant level. For instance, the first column has the head of the tree with the property - MSTV. The next column includes the properties: MLTV, DP and so on. But how can we extract it from the tree in R? – Avi Aug 21 '15 at 09:58
  • Don't put additional info in the comments. Please edit your question. – Jaap Aug 23 '15 at 16:29

0 Answers0