Your tree
is just a dict with up to 10 key-value pairs. In bigger tree, there aren't any more key-value pairs. There are more values inside the values inside the … inside the key-value pairs, but there's still only 10 key-value pairs in the dict. And a dict with around 10 key-value pairs taking 368 bytes seems like about what you should expect.1
As the docs for getsizeof
say:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
…
See recursive sizeof recipe for an example of using getsizeof()
recursively to find the size of containers and all their contents.
Since you don't actually have a completely arbitrary data structure, but just a dict of dicts of etc. And, while you do have some shared references (e.g., if you read the number 1234567
while you already have an int with the same value in memory, Python will just reuse the same object), if you're trying to verify that you can fit into 1.5MB, you really want a worst-case measurement, so you probably want to skip the check for already-seen values.
So, you can write something simpler instead of using that recipe if you want. But the idea will be the same:
def total_dict_size(d):
size = sys.getsizeof(d)
if isinstance(d, dict):
for key, value in d.items():
size += sys.getsizeof(key) + total_dict_size(value)
return size
Your xtree
, on the other hand, is a dict with 8999999 key-value pairs. Doing the same back-of-the-envelope calculation, I'd expect that to be a bit under 300MB. Instead, it's a bit over 300MB. Close enough.
And you're also storing the 8999999 7-digit integers on the heap. To take some nice round numbers, let's say there are 5M distinct integers that don't fall into the handful of small values pre-created and cached by CPython. Each of those integers is small enough to fit into one 30-bit digit, so they take 28 bytes apiece on 64-bit CPython. So, that's another 140MB not accounted for in sys.getsizeof(xtree)
(but they are accounted for—in fact, over-accounted, with the worst-case-measuring implementation given) if you call the recursive function above on either tree
or xtree
.
So, your total memory use between tree
, xtree
, and the actual integers is probably somewhere on the order of 750MB, which doesn't quite fit the < 1.5MB
requirement.
1. Every Python object has some fixed header overhead, for things like the refcount, the pointer to the type, etc., plus type-specific things, like the length for most container types. Call that 64 bytes. A dict then has a hash table. It needs to be a bit bigger than 10 slots, to keep the load well below 1.0; call it 13 slots. Each slot needs a hash value, a reference to the key, and a reference to the value, so that's 3 pointers, or 24 bytes. 64 + 13 * 24 = 376. So that back-of-the-envelope calculation is off by only 8 bytes…