0

I have the following pattern in code which I use quite frequently:

def update_missing_content_type(cls):
    items_missing_content_type = ItemMaster.objects.filter(content_type__isnull=True)
    num_items = items_missing_content_type.count()
    for num, item in enumerate(items_missing_content_type):
        if num % 100 == 0:
            log.info('>>> %s / %s updated...' % (num+1, num_items))
        # do something

The enumerate can be non-ideal though if the size of the Query is non-trivial. However, I still need to know the progress of the script (it might run for ten hours, etc.).

What would be a better pattern than the above to do something over a number of results while logging the general process of it?

David542
  • 104,438
  • 178
  • 489
  • 842

2 Answers2

1

The enumerate can be non-ideal though if the size of the Query is non-trivial.

enumerate produces an iterator, not a list, so it doesn't use up additional memory by preallocating a bunch of memory. Plus, it wouldn't work on infinite-length generators.

rlbond
  • 65,341
  • 56
  • 178
  • 228
1

Enumerate behaves as an iterator, and will produce the integer numberings on the fly. More details here: What is the implementation detail for enumerate? Enumerate should behave almost identically in performance as to just going over the indices of the iterable and looking up the item.

Presumably you need to have the index for logging and the item in #do something, so we can time the two. Here are my results:

python -m timeit -s 'test=range(10)*1000' 'for i, elem in enumerate(test): pass' 1000 loops, best of 3: 370 usec per loop

python -m timeit -s 'test=range(10)*1000' 'for i in xrange(len(test)): elem=test[i]' 1000 loops, best of 3: 397 usec per loop

There seems to be no difference in speed between the two as expected in this use case. There is however a difference if you don't need the index: python -m timeit -s 'test=range(10)*1000' 'for elem in test: pass' 10000 loops, best of 3: 153 usec per loop

Community
  • 1
  • 1
drglove
  • 638
  • 1
  • 6
  • 21
  • `enumerate` is not a generator and does not involve `yield`. It returns an iterator; generators are a specific way of implementing iterators. – user2357112 Jul 03 '15 at 02:52
  • @user2357112 You're correct, the source code (https://hg.python.org/cpython/file/2.7/Objects/enumobject.c) used the word "yields" in the documentation but this is not the keyword `yield` that is referred to by the language specifiction. My mistake, I will update accordingly. – drglove Jul 04 '15 at 01:55