0

The PETL documentation states that in order to load JSON, HTML, XML, or text the data can only originate from a file. How can I load data into PETL in any of these formats from memory such as a string variable rather than a file?

This would be useful when loading data that has already been cleansed or generated by upstream code. Writing to file only to re-read the file is a wasteful and risky (race conditions etc) operation.

Bosco
  • 935
  • 10
  • 18

1 Answers1

0

The following feels a little hacky but at least it avoids writing anything to disk.

import petl
from io import StringIO


d = '''<table>
        <tr>
            <td>foo</td><td>bar</td>
        </tr>
        <tr>
            <td>a</td><td>1</td>
        </tr>
        <tr>
            <td>b</td><td>2</td>
        </tr>
        <tr>
            <td>c</td><td>2</td>
        </tr>
    </table>'''


class OpenableString():

    def __init__(self, str):
        self.value = StringIO(str)

    def open(self, mode):
        return self

    def __exit__(self, type, value, traceback):
        pass

    def __enter__(self):
        return self.value


os = OpenableString(d)

table1 = petl.fromxml(os, 'tr', 'td')

print(table1)

Output:

+-----+-----+
| foo | bar |
+=====+=====+
| a   | 1   |
+-----+-----+
| b   | 2   |
+-----+-----+
| c   | 2   |
+-----+-----+
Bosco
  • 935
  • 10
  • 18