2

I was writing some Python code that involved something like this

values = {}
for element in iterable:
    values.setdefault(element.name, []).append(element)

Because I could have sorted the input previously, I also implemented it like this

values = {}

cur_name = None
cur_list = None

for element in iterable:
    if element.name != cur_name:
        values[cur_name] = cur_list
        cur_name = element.name
        cur_list = []
    cur_list.append(element)
if cur_list:
    values[cur_name] = cur_list
del values[None]

Here the input is already sorted by element.name.

The second approach was much faster than the first approach, and it also used less memory.

What's the reason for this?

Or have I made some sort of mistake in the second approach?

Tianyang Li
  • 1,755
  • 5
  • 26
  • 42
  • `setdefault` is a perfect method for a standard dict. It cannot use "more memory". It just executes on its dictionary. Is this your question? – eumiro Aug 16 '12 at 07:18
  • @lazyr Yes, they are sorted by `element.name`. – Tianyang Li Aug 16 '12 at 07:46
  • 2
    Hi I have tested also (modified) variant without setdefault from Ashwini Chaudhary answer and it is also faster than your setdefault! https://gist.github.com/3368186 – Jiri Aug 16 '12 at 07:49

3 Answers3

4

Your original code every time round the loop will create a list that mostly then just gets thrown away. It also makes multiple dictionary lookups (looking up the method setdefault is a dictionary lookup and then the method itself does a dictionary lookup to see whether the object was set and if it isn't does another to store the value). .name and .append() are also dictionary lookups but they are still present in the revised code.

for element in iterable:
    values.setdefault(element.name, []).append(element)

The revised code only looks up the dictionary when the name changes, so it it removes two dictionary lookups and a method call from every loop. That's why it's faster.

As for the memory use, when the list grows it may sometimes have to copy the data but can avoid that if the memory block can just be expanded. My guess would be that creating all of those unused temporary lists may be fragmenting the memory more and forcing more copies. In other words Python isn't actually using more memory, but it may have more allocated but unused memory.

When you feel a need for setdefault consider using collections.defaultdict instead. That avoids creating the list except when it's needed:

from collections import defaultdict
values = defaultdict(list)
for element in iterable:
    values[element.name].append(element)

That will probably still be slower than your second code because it doesn't take advantage of your knowledge that names are all grouped, but for the general case it is better than setdefault.

Another way would be to use itertools.groupby. Something like this:

from itertools import groupby
from operator import attrgetter
values = { name: list(elements) for name,elements in
    groupby(elements, attrgetter('name')) }

That takes advantage of the ordering and simplifies everything down to a single dictionary comprehension.

Duncan
  • 92,073
  • 11
  • 122
  • 156
2

I can think of a couple of reasons why the second approach is faster.

values.setdefault(element.name, []).append(element)

Here you're creating an empty list for each element, even if you're never going to use it. You're also calling the setdefault method for every element, and that amounts to one hash table lookup and a possible hash table write, plus the cost of calling the method itself, which is not insignificant in python. Finally, as the others have pointed out after I posted this answer, you're looking up the setdefault attribute once for each element, even though it always references the same method.

In the second example you avoid all of these inefficiencies. You're only creating as many lists as you need, and you do it all without calling any method but the required list.append, interspersed with a smattering of dictionary assignments. You're also in effect replacing the hash table lookup with a simple comparison (element.name != cur_name), and that's another improvement.

I expect you also get cache benefits, since you're not jumping all over the place when adding items to lists (which would cause lots of cache misses), but work on one list at a time. This way, the relevant memory is probably in a cache layer very near the CPU so that the process is faster. This effect should not be underestimated -- getting data from RAM is two orders of magnitude (or ~100 times) slower than reading it from L1 cache (source).

Of course the sorting adds a little time, but python has one of the best and most optimized sorting algorithms in the world, all coded in C, so it doesn't outweigh the benefits listed above.

I don't know why the second solution is more memory efficient though. As Jiri points out it might be the unneccessary lists, but my understanding is that these should have been collected immediately by the garbage collector, so it should only increase the memory usage by a tiny amount -- the size of a single empty list. Maybe it's because the garbage collector is lazier than I thought.

Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92
  • I was running the program on a text file, and I watched the memory usage in Ubuntu's System Monitor. The first one took around 3GB of memory and I could really see the rise in memory usage. However, I couldn't see much of a change in memory usage for the second one. – Tianyang Li Aug 16 '12 at 07:52
  • Could be the empty list creation reason for worse memory efficiency? I think so. At least for large lists. – Jiri Aug 16 '12 at 07:54
1

Your first version has two inefficient parts:

  1. calling and dereferencing values.setdefault in a loop. You can assing values_setdefault = values.setdefault before the loop and it can speed things up a bit.

  2. as other answer suggested, creating new empty list for every element in your list is very slow and memory inefficient. I do not know how to avoid it and using setdefault at once.

Jiri
  • 16,425
  • 6
  • 52
  • 68