1

I'm using TinyDB for a small CLI utility to manage personal document drafts. The database stores metadata for each draft; the file should be human-editable (so that I can add details manually), and for this reason I'd like to use YAML over JSON as the format.

I implemented a YamlStorage class subclassing storages.Storage as indicated in the TinyDB docs:

class TestYamlStorage(Storage):
    """
    Store the data in a YAML file.
    Written following the example at http://tinydb.readthedocs.io/en/latest/extend.html#write-a-custom-storage
    """
    def __init__(self, filename):  # (1)
        super().__init__()
        self.filename = filename
        touch(filename)

    def read(self):
        with open(self.filename) as handle:
            try:
                data = yaml.load(handle.read())
                return data
            except yaml.YAMLError:
                return None  # (3)

    def write(self, data):
        print('writing data: {}'.format(data))
        with open(self.filename, 'w') as handle:
                yaml.dump(data, handle)


    def close(self):  # (4)
        pass

Everything works fine when inserting only one element, or multiple elements at the same time using insert_multiple:

db = TinyDB('db.yaml', storage=TestYamlStorage)
dicts = [
    dict(name='Homer', age=38),
    dict(name='Marge', age=34),
    dict(name='Bart', age=10)
]

# this works as expected
db.insert_multiple(dicts)

The resulting db.yaml:

_default:
  1: {age: 38, name: Homer}
  2: {age: 34, name: Marge}
  3: {age: 10, name: Bart}

However, when inserting elements multiple times with insert, the resulting YAML file is different:

db = TinyDB('db.yaml', storage=TestYamlStorage)

db.insert(dict(name='Homer', age=38))
db.insert(dict(name='Bart', age=10))

db.yaml:

_default:
  1: !!python/object/new:tinydb.database.Element
    dictitems: {age: 38, name: Homer}
    state: {eid: 1}
  2: {age: 10, name: Bart}

The data in this format (apart from looking messier) seems to be not compatible with yaml.safe_load (calling db.all() returns []). My interpretation is that the YAML serialization process is in some way "over-eager", i.e. that the Element instance gets written to db.yaml instead of the underlying data.

Is there something wrong with my code? I've tried to fiddle with PyYAML options, using a different YAML module (ruamel.yaml), and create a second YamlStorage class copying from the default JSONStorage, but without any difference.

Version info: Python 3.4.3, TinyDB 3.2.0, PyYAML 3.11. I posted a runnable MWE with all imports here.

Edit

After @Anthon's suggestion, I tried printing the YAML output to sys.stdout immediately before dumping to file. The problem is reproduced also in this case. See notebook.

fndari
  • 11
  • 3

1 Answers1

0

When you update an existing "database" you retrieve a database.Element which includes (as you can see in the second YAML file) state information.

When that again is saved you are not saving a dict, but an instance of this Element which is a subclass of dict and for that ruamel.yaml (and PyYAML) needs to store both the dictitems (the key value pairs for the dict) and the state (a dictionary representing that attributes and their values).

Converting your Element to a dict explicitly before writing should do the trick:

    def write(self, data):
        print('writing data: {}'.format(data))
        with open(self.filename, 'w') as handle:
                yaml.dump(dict(data), handle)
    #                      ^^^^    ^
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • I tried your suggestion, but unfortunately it doesn't seem to make a difference. If it can be relevant, `print(type(data))` before the call to `yaml.dump` returns ``. – fndari May 17 '16 at 17:02
  • I would rather do a YAML dump of that with `stream=sys.stdout`, so you can see what YAML thinks it is getting. – Anthon May 17 '16 at 19:01
  • I added a link to a notebook to my question with the dump to `sys.stdout`. – fndari May 18 '16 at 15:13