Loading JSON, HTML, XML, or Text into PETL from memory rather than file

Question

The PETL documentation states that in order to load JSON, HTML, XML, or text the data can only originate from a file. How can I load data into PETL in any of these formats from memory such as a string variable rather than a file?

This would be useful when loading data that has already been cleansed or generated by upstream code. Writing to file only to re-read the file is a wasteful and risky (race conditions etc) operation.

Bosco · Answer 1 · 2019-02-08T10:55:56.187

The following feels a little hacky but at least it avoids writing anything to disk.

import petl
from io import StringIO


d = '''<table>
        <tr>
            <td>foo</td><td>bar</td>
        </tr>
        <tr>
            <td>a</td><td>1</td>
        </tr>
        <tr>
            <td>b</td><td>2</td>
        </tr>
        <tr>
            <td>c</td><td>2</td>
        </tr>
    </table>'''


class OpenableString():

    def __init__(self, str):
        self.value = StringIO(str)

    def open(self, mode):
        return self

    def __exit__(self, type, value, traceback):
        pass

    def __enter__(self):
        return self.value


os = OpenableString(d)

table1 = petl.fromxml(os, 'tr', 'td')

print(table1)

Output:

+-----+-----+
| foo | bar |
+=====+=====+
| a   | 1   |
+-----+-----+
| b   | 2   |
+-----+-----+
| c   | 2   |
+-----+-----+

Loading JSON, HTML, XML, or Text into PETL from memory rather than file

1 Answers1