1

I wonder if there exist some mechanism, to somehow freeze an dict and make it automatically some kind of class with slots. I need to store some json files and want to quickly gain the advantages of space-saving slots, but dont want to create all the classes, because our json is pretty diverse.

Frozendict does not seem to have that, at least pympler.asizeof is not showing any lower memory footprint.

Update: I have objects like this

{ 
  "titles": [{"title": x, "subtitle": y}, {"title": v, "subtitle": z}],
  "authors": [{"name": x, "location": y}, {"name": z}],
  "keywords": [{"type": a, "content": b}, ...],
  "prices": a,
   ...
}

Especially the keywords are quite often used and consuming a good portion of memory. So far I transformed the keywords to a slotted class but wanted to know, if I can somehow but this whole structure as a kind of fixed dict where I say there wont change anything, so now optimize the memory footprint for this.

RichieK
  • 474
  • 6
  • 15

1 Answers1

0

So far I came to

def compress_dict(slots):
    class SlottedDict:
        __slots__ = list(slots.keys())
        
        def __init__(self, slots):
            for key in slots:
                if isinstance(slots[key], dict):
                    SlottedDict.__dict__[key].__set__(self, compressDict(slots[key]))
                elif isinstance(slots[key], list):
                    if all([isinstance(item, dict) for item in slots[key]]):
                        slotted_list = []
                        for item in slots[key]:
                            slotted_list.append(compressDict(item))
                        SlottedDict.__dict__[key].__set__(self, slotted_list)
                    else:
                        SlottedDict.__dict__[key].__set__(self, slots[key])
                else:
                    SlottedDict.__dict__[key].__set__(self, slots[key])
                    
        def get(self, item, default=None):
            return SlottedDict.__dict__[item].__get__(self, SlottedDict)
     
        def to_json(self):
            json_dict = {}
            for key in SlottedDict.__dict__['__slots__']:
                item = SlottedDict.__dict__[key].__get__(self, SlottedDict)
                if type(item).__name__  ==  "SlottedDict":
                    item = item.to_json()
                elif isinstance(item, list):
                    item = [i.to_json() if type(i).__name__  ==  "SlottedDict" else i for i in item]
                json_dict[key] = item
            return json_dict

    return SlottedDict(slots)

I'm not sure if I can somehow improve this, but so far it seems to work. For my example I can access the dict via

>>> d = { 
  "titles": [{"title": "x", "subtitle": "y"}, {"title": "v", "subtitle": "z"}],
}

>>> compress_dict(d).titles
[{"title": "x", "subtitle": "y"}, {"title": "v", "subtitle": "z"}]

>>> print(asizeof(d), asizeof(compress_dict(d)))
1200 480

>>> print(compressDict(d).to_json() == d)
True

I just wonder that there is nothing like this in python stdlab, at least I did not found anything...

Edit:

I found out, that all these classes somehow also have to be saved. My test was of the following kind: I compared RAM usage for some simple dict ({"a": "b", "c": "d"}) with a) compress_dict, b) some slotted class only containing slots a and c, and c) dict itself. All that with 1_000_000 dicts in a list. a) led to 2.6GB, b) to 98MB and c) to 270MB. So it seems the big disadvantage is, that this piece of code cannot reuse existing automatically created classes...

RichieK
  • 474
  • 6
  • 15