I am trying to calculate the exact size of the protocol buffer objects.
I went through the following links: How do I determine the size of an object in Python? and https://goshippo.com/blog/measure-real-size-any-python-object/
But the protocol buffer objects do not include dict in dir(object) since it can cause corruption by people trying to manually add parameters to it. This is based on my understanding, though it might not be complete or correct.
So, I started with this protocol buffer message definition
syntax = "proto2";
package test;
message Inner {
optional bytes inner_id = 1;
optional string inner_name = 2;
optional int64 inner_value = 3;
}
message Outer {
optional bytes uuid = 1;
optional string name = 2;
enum Test {
kOne = 1;
kTwo = 2;
}
optional Test testing = 3;
repeated Inner inner_list = 4;
}
This is sample usage
import uuid
from test_pb2 import Inner, Outer
x = Outer()
x.uuid = uuid.uuid4().bytes
x.name = "test"
x.testing = Outer.kOne
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok1", inner_value=1)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok2", inner_value=2)
x.inner_list.add(inner_id=uuid.uuid4().bytes, inner_name="ok3", inner_value=3)
print id(x.inner_list)
print id(x.inner_list[0].inner_id)
print id(x.inner_list[1].inner_id)
print id(x.inner_list[2].inner_id)
print id(x.inner_list[0].inner_name)
print id(x.inner_list[1].inner_name)
print id(x.inner_list[2].inner_name)
print id(x.inner_list[0].inner_value)
print id(x.inner_list[1].inner_value)
print id(x.inner_list[2].inner_value)
The id of inner_id, inner_name and inner_value is the same even though they belong to a different list and have different values.
So, the modification of code in above link did not work as expected
def get_size(obj, seen=None):
"""Recursively finds size of objects"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if isinstance(obj, dict):
size += sum([get_size(v, seen) for v in obj.values()])
size += sum([get_size(k, seen) for k in obj.keys()])
elif hasattr(obj, '__dict__'):
size += get_size(obj.__dict__, seen)
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum([get_size(i, seen) for i in obj])
else:
try:
for desc, _ in obj.ListFields():
if desc.label == FieldDescriptor.LABEL_REPEATED:
size += sum([get_size(i, seen) for i in getattr(obj, desc.name)])
else:
size += get_size(getattr(x, desc.name), seen)
except Exception as ex:
pass
return size
Since it tripped at the id check (obj_id in seen) and did not account for the different memory requirement between "ok1" and "ok2" for example
Could anyone please explain the reason between the same "ids" and how to correctly calculate the size of protocol buffers?
Thanks in advance.