0

I need to initialize a list of defaultdicts. If they were, say, strings, this would be tidy:

list_of_dds = [string] * n

…but for mutables, you get right into a mess with that approach:

>>> x=[defaultdict(list)] * 3
>>> x[0]['foo'] = 'bar'
>>> x
[defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'})]

What I do want is an iterable of freshly-minted distinct instances of defaultdicts. I can do this:

list_of_dds = [defaultdict(list) for i in xrange(n)]

but I feel a little dirty using a list comprehension here. I think there's a better approach. Is there? Please tell me what it is.

Edit:

This is why I feel the list comprehension is suboptimal. I'm not usually the pre-optimization type, but I can't bring myself to ignore the speed difference here:

>>> timeit('x=[string.letters]*100', setup='import string')
0.9318461418151855
>>> timeit('x=[string.letters for i in xrange(100)]', setup='import string')
12.606678009033203
>>> timeit('x=[[]]*100')
0.890861988067627
>>> timeit('x=[[] for i in xrange(100)]')
9.716886043548584
kojiro
  • 74,557
  • 19
  • 143
  • 201

2 Answers2

2

Your approach using the list comprehension is correct. Why do you think it's dirty? What you want is a list of things whose length is defined by some base set. List comprehensions create lists based on some base set. What's wrong with using a list comprehension here?

Edit: The speed difference is a direct consequence of what you are trying to do. [[]]*100 is faster, because it only has to create one list. Creating a new list each time is slower, yeah, but you have to expect it to be slower if you actually want 100 different lists.

(It doesn't create a new string each time on your string examples, but it's still slower, because the list comprehension can't "know" ahead of time that all the elements are going to be the same, so it still has to reevaluate the expression every time. I don't know the internal details of the list comp, but it's possible there's also some list-resizing overhead because it doesn't necessarily know the size of the index iterable to start with, so it can't preallocate the list. In addition, note that some of the slowdown in your string example is due to looking up string.letters on every iteration. On my system using timeit.timeit('x=[letters for i in xrange(100)]', setup='from string import letters') instead --- looking up string.letters only once --- cuts the time by about 30%.)

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • I've updated my question with an explanation why I don't like the list comprehension. I apologize for not putting it there in the first place. – kojiro Jul 25 '12 at 21:49
  • How does `x = [ [],[],[]... [] ]`, with the empty list copied out 100 times, perform? – Russell Borogove Jul 25 '12 at 21:59
  • @RussellBorogove surprisingly slowly at 3.87 seconds on my machine. `timeit("x=[%s]" % ('[],'*100))` – kojiro Jul 25 '12 at 22:04
  • @kojiro: see my edited answer. It has to be slower if you want distinct objects instead of the same object over and over again. – BrenBarn Jul 25 '12 at 22:09
1

The list comprehension is exactly what you should use.

The problem with the list multiplication is that the list containing a single mutable object is created and then you try to duplicate it. But by trying to duplicate the object from the object itself, the code used to create it is no longer relevant. Nothing you do with the object is going to do what you want, which is run the code used to create it N times, because the object has no idea what code was used to create it.

You could use copy.copy or copy.deepcopy to duplicate it, but that puts you right back in the same boat because then the call to copy/deepcopy just becomes the code you need to run N times.

A list comprehension is a very good fit here. What's wrong with it?

Ben
  • 68,572
  • 20
  • 126
  • 174