2

Why is img_list smaller than compressed even though is smaller?

Input

print(sys.getsizeof(img_list[0:1]))
print(img_list[0:1])
print(sys.getsizeof(compressed[0:2]))
print(compressed[0:2])

print(sys.getsizeof(img_list))
print(sys.getsizeof(compressed))

img_arr = np.asanyarray(img)
print(img_arr.shape)

comp_arr = np.asarray(compressed)
print(comp_arr.shape)

Output

72
[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [19, 19, 19], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]]
80
[[12, [0, 0, 0]], [1, [19, 19, 19]]]

256
2536

(24, 24, 3)
(306, 2)
  • 2
    you haven't show then length of your lists here. Can you provide a [mcve]? – juanpa.arrivillaga Dec 11 '19 at 18:03
  • @Error-SyntacticalRemorse I don't think that's what is going on here. Again, the OP hasn't actually provided the `len` of either list. I suspect that `img_list` is actually about 1/10 the length of `compressed`. – juanpa.arrivillaga Dec 11 '19 at 18:08
  • 1
    @Error-SyntacticalRemorse but the examples with *slices* are consistent, the smaller one has the smaller memory footprint. The slices have length 1 and 2 respectively... – juanpa.arrivillaga Dec 11 '19 at 18:10
  • I will provide the full lists instead – Den Fula Ankungen Dec 11 '19 at 18:10
  • I converted them to numpy arrays and printed the shape of them for clarification – Den Fula Ankungen Dec 11 '19 at 18:15
  • @Error-SyntacticalRemorse no, **logic** would not indicate that a list of length 1 is longer than a list of length 2. Logic would dictate the *opposite*. The list itself only contains references to other objects, and the `sys.getsizeof` function only takes into account the memory usage of the *list object itself*, and will not recursively find the total memory usage of the entire object graph. – juanpa.arrivillaga Dec 11 '19 at 18:15
  • @DenFulaAnkungen why do you think that is relevant? A `numpy.ndarray` is nothing like a Python list. Your first list **is smaller** than your second list, as implied by what the resulting numpy array shapes give you. That is, `len(img_list) == 24` and `len(compressed) == 306`. Just as I predicted, `img_list` is about 1/10 the length of `compressed`. Note, `sys.getsizeof` *only gives you the memory consumption of the list itself*, not of the items the list happens to be referencing – juanpa.arrivillaga Dec 11 '19 at 18:17
  • @juanpa.arrivillaga but the total data in the list is less in the second list, so why would that matter? I'm not the best at memory and arrays so please explain – Den Fula Ankungen Dec 11 '19 at 18:18
  • @DenFulaAnkungen because `list` objects are not arrays. The list itself only contains refernces to objects. Each reference requires (and this is an implementation detail) about a machine word (the size of a pointer). Usually 8 bytes on a modern system – juanpa.arrivillaga Dec 11 '19 at 18:19

1 Answers1

1

sys.getsize() is deceiving. It returns the size of an object; however, it does not consider the size of that objects attributes. In other words, it does not recursively go through the object you are getting the size of. Ultimately you need to do that on your own:

import sys

l1 = [[[0, 0, 0], [0, 19, 19], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 13, 0], [0, 0, 0], [19, 19, 19], [110, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 12, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]]

l2 = [[12, [0, 0, 0]], [1, [19, 19, 19]]]

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size
print(get_size(l1))
print(get_size(l2))

Output:

5280
564

Reference: Measure the Real Size of Any Python Object

What you were doing was basically:

sys.getsizeof([[]])

and

sys.getsizeof([[], []]) # This is bigger
  • Thanks a lot, really helpful – Den Fula Ankungen Dec 11 '19 at 18:20
  • I still have some issues though. When I run them through the function. When I run the ```img_arr``` and the ```comp_arr```, I get the results ```582``` for the ```img_arr``` and ```5484``` for the ```comp_arr```. When I run the list they are almost the same size. I suppose that is the corrects values, but why am I getting them? – Den Fula Ankungen Dec 11 '19 at 18:52
  • @DenFulaAnkungen is it possible several elements of your array are pointing to the same object? like `a = [[0,0,0],[0,0,0],[0,0,0]]` will have 3 distinct lists which happen to have the same elements where as `sub = [0,0,0]; b = [sub,sub,sub]` has 3 references to the same list and therefore less space. – Tadhg McDonald-Jensen Dec 11 '19 at 19:03