I can think of a couple of reasons why the second approach is faster.
values.setdefault(element.name, []).append(element)
Here you're creating an empty list for each element
, even if you're never going to use it. You're also calling the setdefault
method for every element, and that amounts to one hash table lookup and a possible hash table write, plus the cost of calling the method itself, which is not insignificant in python. Finally, as the others have pointed out after I posted this answer, you're looking up the setdefault
attribute once for each element
, even though it always references the same method.
In the second example you avoid all of these inefficiencies. You're only creating as many lists as you need, and you do it all without calling any method but the required list.append
, interspersed with a smattering of dictionary assignments. You're also in effect replacing the hash table lookup with a simple comparison (element.name != cur_name
), and that's another improvement.
I expect you also get cache benefits, since you're not jumping all over the place when adding items to lists (which would cause lots of cache misses), but work on one list at a time. This way, the relevant memory is probably in a cache layer very near the CPU so that the process is faster. This effect should not be underestimated -- getting data from RAM is two orders of magnitude (or ~100 times) slower than reading it from L1 cache (source).
Of course the sorting adds a little time, but python has one of the best and most optimized sorting algorithms in the world, all coded in C, so it doesn't outweigh the benefits listed above.
I don't know why the second solution is more memory efficient though. As Jiri points out it might be the unneccessary lists, but my understanding is that these should have been collected immediately by the garbage collector, so it should only increase the memory usage by a tiny amount -- the size of a single empty list. Maybe it's because the garbage collector is lazier than I thought.