What is the purpose of using StringIO in DecisionTree

Question

I am writing a decision tree and the following code is a part of the complete code:

def show_tree(tree, features, path):
    f = io.StringIO()
    export_graphviz(tree, out_file=f, feature_names=features)
    pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)
    img = misc.imread(path)
    plt.rcParams['figure.figsize'] = (20,20)
    plt.imshow(img)

Could any one please tell me what is the purpose of using StringIO here?

Hichem BOUSSETTA · Answer 1 · 2019-05-01T08:55:03.637

1

StringIO represents an in-memory text file. It can be used exactly the same as any text file, so you can write / read from it. The access is faster than regular file because the stringio buffer is managed in memory, but in the other hand it is not persistent on disk.

In the example you're giving, you could also have used a regular text file.

This is an example with a regular dot text file:

def show_tree(tree, features, path):
    f = 'tree.dot'
    export_graphviz(tree, out_file=f, feature_names=features)
    pydotplus.graph_from_dot_file(f).write_png(path)
    img = misc.imread(path)
    plt.rcParams['figure.figsize'] = (20,20)
    plt.imshow(img)

And this is another example without file and without StringIO by just using the string content of the dot file exported by export_graphviz()

def show_tree(tree, features, path):
    dot_data = export_graphviz(tree, out_file=None, feature_names=features)
    pydotplus.graph_from_dot_data(dot_data).write_png(path)
    img = misc.imread(path)
    plt.rcParams['figure.figsize'] = (20,20)
    plt.imshow(img)

edited May 01 '19 at 08:55

answered Apr 28 '19 at 22:32

Hichem BOUSSETTA

1,791
1
21
27

Could you please write the code for using a regular text file? – Hasham Beyg Apr 29 '19 at 07:53
I am researching on .dot format as I am new to python. Will get back to you when I have an idea – Hasham Beyg May 01 '19 at 13:21
Could you please elaborate the impact of leaving out_file as none. – Hasham Beyg May 19 '19 at 21:46
The impact is just that export_graphviz() will not write to file and will instead return the data as a string. I think this solution is equivalent to using stringIO because the data in saved in memory. – Hichem BOUSSETTA May 19 '19 at 22:41
So in all three cases i.e: 1- Storing the data in StringIO 2- Storing in a regular text file 3- And storing in the string content of the dot file. The data is stored as DOT data. Please correct me if I am wrong – Hasham Beyg May 24 '19 at 12:01
This might come as a silly question but if I may ask. What is the difference between writing (f) and (f.getvalue) in the following codes: pydotplus.graph_from_dot_file(f).write_png(path) pydotplus.graph_from_dot_data(f.getvalue()).write_png(path) – Hasham Beyg May 24 '19 at 12:44

score 1 · Accepted Answer · answered Apr 28 '19 at 22:52

Python is not my leading language, however I think answer for your question is quite simple and does not require lot of research.

StringIO is used here to maintain Input/Output text stream. Your function show tree, however for doing that it needs a way to do it, some kind of data transport highway.

Here f = io.StringIO() you're initializing your data stream. After that you are free to use it as you want, in this particular case:

export_graphviz(tree, out_file=f, feature_names=features)

Here: out_file=f you export data to your stream using initialized before f = io.StringIO();. As StringIO is in-memory text file, you basically put your data aside in stream object for further use. Thanks to that you don't have to write your data into .dot file, instead you temporary hold it.(And temporary means for as long as your stream is in use)

More about this particular case

pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)

Here: f.getvalue() you generate your graph from .dot data. In the most basic use you should ensure path to .dot file in which previously generated data would be stored, but YOU DON'T HAVE TO! That is the trick, your data is still in stream object which you created and filled beforehand! So now all you have to do is direct it straight to this function which will generate your graph image with that data and save it as .png file.

Communication between system files and your program can be established in many ways but usually you use streams. You initialize stream at the very beginning, use it and then close. Every std::cout or std:err(my main language reference, sorry for that non-python example) is that stream. Stream allows you to maintain data exchange between your program and designated tagret(e.g. console, or in this case file), however you can use it also as temporary storage space which in that particular case will speed up image generation process as you don't really have to write and load data into file. All you have to do thanks to that is writing data to stream in an order which other function will accept, and then use the very same stream to read that data for image generation purposes.

More about StringIO

string IO is human readable as it utilizes unicode. It has to be converted into bytes IO to make it readable by the computer. So why didn't the author of the code use bytes IO? And why didn't he assign any value to the string io? — Hasham Beyg, Apr 30 '19 at 21:15
Your target data recipient requires `.dot` format which is quite far away from binary format, why and based on what you think it needs to be converted to `ByteIO`? I don't understand second part of your question about assigning any value to `StringIO` I believe I explained the idea behind this module quite well above. — PStarczewski, Apr 30 '19 at 21:26
I am fairly new in python and I just recently grasped the idea behind string IO and bytes IO which led me to ASCII, UTF and Unicode. String IO utilizes Unicode string and Bytes IO utilizes bytes string. So basically StringIO is for text. You use it when you have text in memory that you want to treat as coming from or going to a file. BytesIO is for bytes. It's used in similar contexts as StringIO, except with bytes instead of text. And now I am beginning to understand the idea you explained above — Hasham Beyg, May 01 '19 at 13:19
Nothing bad in being fairly new. String concept is common in many languages, you get it once you'll easily understand it in any other language. @Hichem BOUSSETTA gave you good examples of a few different approaches to the same problem. Focus on reading `pydotplus` documentation and its functions. If function requires `.dot` file or `.dot` data that is exactly what you have to provide. If you provide it as `stringstream`, actual file with `.dot` data or just variable(` graph_from_dot_data` case) is up to you. — PStarczewski, May 01 '19 at 14:36

What is the purpose of using StringIO in DecisionTree

2 Answers2