0

how to make a table from a tree dot file??

for example: this lines from dot file :

0 [label="TV <= -0.239\nmse = 25.8\nsamples = 160\nvalue = 14.218"] ;

1 [label="TV <= -1.422\nmse = 7.824\nsamples = 66\nvalue = 10.015"] ;

0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;

2 [label="radio <= 0.549\nmse = 2.58\nsamples = 19\nvalue = 6.805"] ;
1 -> 2 ;

so the table:

0,TV,-0.239
1,TV,-1.422
2,radio,0.549
.
.
.

how can I make this table in python??

JNevill
  • 46,980
  • 4
  • 38
  • 63
  • Are you asking how to do this with an existing feature of a Python package, or how to write code to do it yourself? – CryptoFool Mar 11 '22 at 18:21

1 Answers1

0

If you're looking to do this with your own code, applying a regular expression to pick apart each line of the file is straightforward. Here's an example that gives the desired result for your input:

import re

pat = re.compile(r'^(\d+).*?\[label="(\S+)\s+<=\s+(\S+?)\\n')

with open('graph.dot') as f:
    for line in f:
        m = pat.match(line)
        if m:
            print(",".join(m.groups()))

Result:

0,TV,-0.239
1,TV,-1.422
2,radio,0.549

I'm not familiar with this file format, so I don't know if you'd need a more sophisticated expression than this one to handle all possible valid inputs. If the above expression doesn't work for all possible lines that you want to map to the resulting table, you should be able to tweak the expression to get the behavior you desire.

If there is a package that will do this for you so that you don't have to understand the details of the file format, using that would obviously be a cleaner solution. I'm not familiar with this particular problem domain, so I'm not one to tell you if such a thing might exist.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • yes sir,we can say a group instead of a table, make groups for each row (id , name,value(threshold) from the dot file , what's that's mean ? – Maryam Azeez Mar 13 '22 at 11:44
  • depending on the dot file that I sent let's make three groups.. make a group1 for id 0 and everything belong to the id 0 the name and the value, the same we will do for id 1 ,we make group2 for id 1 and everything belong to id 1 the name and the value will put them in group2, the same thing we will do for id 2 we will put id 2 in to group3 with name and the value for this row...and so on..I hope you understand what I mean? Thank you for your patience.. – Maryam Azeez Mar 13 '22 at 11:44
  • I hope you can help me . – Maryam Azeez Mar 13 '22 at 11:49
  • @MaryamAzeez - I think I know about what you want, but maybe not exactly. Please add on to your question to show sample input and output that demonstrates what you want. You can just leave what's already in the question alone and start a new section with a title like "UPDATE:" or "PART 2:" or something like that. Even if all you give me is sample input data, I'll be in better shape to help you. If I understand what you want, then the current sample data isn't enough as it would be better to have some of the groups have more than one entry in them. – CryptoFool Mar 14 '22 at 04:10
  • thank you so much, sir, I found the solution to my problem.. – Maryam Azeez Mar 16 '22 at 15:14
  • but sir I have a question please,how can I use for loop to read data and predict the leafs for all the data that I have?? I use this code but it read just one row, not all data..!! test=x1[0,:] # I should change the number (0) to read another row,I have 200 rows so it's take time test=test.reshape(1,test.shape[0]) print(test) print(DtReg.predict(test)) – Maryam Azeez Mar 16 '22 at 15:28
  • I don't know what you mean when you say that this code "read just one row". It reads the entire file line by line. It outputs something for each line in the file that matches the pattern. If you wanted to read the whole file and the process it further, then you could store what you read in a list or a dictionary, and then do whatever you want with the data once you've read it in. It sounds like maybe you want to have a dictionary where the keys are the first number on the line, and the values are a list of the data from the lines with that number at the start. – CryptoFool Mar 16 '22 at 18:25
  • Again...if you need more help, please add to your question to show what you want to read and what you want to produce as a result. – CryptoFool Mar 16 '22 at 18:28
  • hello sir again, please sir I need your help very much the table that you made , as I told you before the data it's about tree dot file of regression tree , I used your code to make the table: 0,TV,-0.239 1,TV,-1.422 2,radio,0.549 5,radio,0.253 8,radio,0.236 9,radio,-0.932 12,TV,0.711 – Maryam Azeez Apr 09 '22 at 18:55
  • which zero is the root node of the tree, 1 is the left branch node of the tree, 8 is the right branch of the tree, ... etc, please sir how can I filter or split into train data set and test data set by these nodes the numbers and the values of each row of the table? ( 0 is the number , -0.239 is the value), and then make MLR models from it?? please sir I need your help, I'm tired of trying,please – Maryam Azeez Apr 09 '22 at 18:56
  • I could not send the data set here !! ..if possible give me your email.. – Maryam Azeez Apr 09 '22 at 19:04