It sounds like you're asking about something like __length_hint__
. Excerpts from PEP 424 – A method for exposing a length hint:
CPython currently defines a __length_hint__
method on several types, such as various iterators. This method is then used by various other functions (such as list
) to presize lists based on the estimate returned by __length_hint__
. Types which are not sized, and thus should not define __len__
, can then define __length_hint__
, to allow estimating or computing a size (such as many iterators).
Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__
, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.
For example, range
iterators support this (Try it online!):
it = iter(range(1000))
print(it.__length_hint__()) # prints 1000
next(it)
print(it.__length_hint__()) # prints 999
And list
iterators even take list length changes into account (Try it online!):
a = [None] * 10
it = iter(a)
print(it.__length_hint__()) # prints 10
next(it)
print(it.__length_hint__()) # prints 9
a.pop()
print(it.__length_hint__()) # prints 8
a.append(None)
print(it.__length_hint__()) # prints 9
Generator iterators don't support it, but you can support it in other iterators you write. Here's a demo iterator that...
- Produces 10,000 elements.
- Hints at having 5,000 elements.
- After every 1,000 elements it shows the memory size of the list being built.
import gc
beacon = object()
class MyIterator:
def __init__(self):
self.n = 10_000
def __iter__(self):
return self
def __length_hint__(self):
print('__length_hint__ called')
return 5_000
def __next__(self):
if self.n == 0:
raise StopIteration
self.n -= 1
if self.n % 1_000 == 0:
for obj in gc.get_objects():
if isinstance(obj, list) and obj and obj[0] is beacon:
print(obj.__sizeof__())
return beacon
list(MyIterator())
Output (Try it online!):
__length_hint__ called
45088
45088
45088
45088
45088
50776
57168
64360
72456
81560
We see that list
asks for a length hint and from the start pre-allocates enough memory for 5,000 references of 8 bytes each, plus 12.5% overallocation. After the first 5,000 elements, it doesn't ask for length hints anymore, and keeps increasing its size bit by bit.
If my __length_hint__
instead accurately returns 10,000, then list
instead pre-allocates 90088
bytes and that remains until the end.