18

It would be convenient if a defaultdict could be initialized along the following lines

d = defaultdict(list, (('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2),
   ('b', 3)))

to produce

defaultdict(<type 'list'>, {'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]})

Instead, I get

defaultdict(<type 'list'>, {'a': 2, 'c': 3, 'b': 3, 'd': 4})

To get what I need, I end up having to do this:

d = defaultdict(list)
for x, y in (('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2), ('b', 3)):
    d[x].append(y)

This is IMO one step more than should be necessary, am I missing something here?

smci
  • 32,567
  • 20
  • 113
  • 146
iruvar
  • 22,736
  • 7
  • 53
  • 82
  • 1
    How would it know to use append on the list? And what about other types? – Jon Clements Aug 29 '13 at 21:02
  • @JonClements, good point. However one would think that `list` is a common-enough use case that a convenience method(perhaps a class-method) is justified? – iruvar Aug 29 '13 at 21:25
  • 5
    Wouldn't this convenience method be pretty much exactly what you just wrote out at the end of your post? Why not wrap those three lines in a function and call it a day? – Henry Keiter Aug 29 '13 at 21:34
  • 1
    @1_CR: I would say that `defaultdict(list)` is super-common. But **initializing** that data structure in the manner you propose is... less so. – John Y Aug 29 '13 at 21:42

5 Answers5

21

What you're apparently missing is that defaultdict is a straightforward (not especially "magical") subclass of dict. All the first argument does is provide a factory function for missing keys. When you initialize a defaultdict, you're initializing a dict.

If you want to produce

defaultdict(<type 'list'>, {'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]})

you should be initializing it the way you would initialize any other dict whose values are lists:

d = defaultdict(list, (('a', [1, 2]), ('b', [2, 3]), ('c', [3]), ('d', [4])))

If your initial data has to be in the form of tuples whose 2nd element is always an integer, then just go with the for loop. You call it one extra step; I call it the clear and obvious way to do it.

John Y
  • 14,123
  • 2
  • 48
  • 72
  • 3
    +1. BTW, initializing it as `defaultdict(list, {'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]})` also works. – ChaimG Jan 19 '18 at 15:29
10

the behavior you describe would not be consistent with the defaultdicts other behaviors. Seems like what you want is FooDict such that

>>> f = FooDict()
>>> f['a'] = 1
>>> f['a'] = 2
>>> f['a']
[1, 2]

We can do that, but not with defaultdict; lets call it AppendDict

import collections

class AppendDict(collections.MutableMapping):
    def __init__(self, container=list, append=None, pairs=()):
        self.container = collections.defaultdict(container)
        self.append = append or list.append
        for key, value in pairs:
            self[key] = value

    def __setitem__(self, key, value):
        self.append(self.container[key], value)

    def __getitem__(self, key): return self.container[key]
    def __delitem__(self, key): del self.container[key]
    def __iter__(self): return iter(self.container)
    def __len__(self): return len(self.container)
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
4

Sorting and itertools.groupby go a long way:

>>> L = [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2), ('b', 3)]
>>> L.sort(key=lambda t:t[0])
>>> d = defaultdict(list, [(tup[0], [t[1] for t in tup[1]]) for tup in itertools.groupby(L, key=lambda t: t[0])])
>>> d
defaultdict(<type 'list'>, {'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]})

To make this more of a one-liner:

L = [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2), ('b', 3)]
d = defaultdict(list, [(tup[0], [t[1] for t in tup[1]]) for tup in itertools.groupby(sorted(L, key=operator.itemgetter(0)), key=lambda t: t[0])])

Hope this helps

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 4
    If the OP doesn't like a perfectly fine `for` loop, I doubt that `itertools.groupby`, a `sort`, a list comp, and either `lambda` or `itemgetter` is going to appeal.. – DSM Aug 29 '13 at 21:09
  • 1
    Interesting. But note that the `sort` and the `groupby` end up doing all the leg-work here. so that you could just as easily feed the output from the `groupby` to a regular `dict` instead of a `defaultdict`! – iruvar Aug 29 '13 at 21:16
  • 1
    @1_CR: You're right. But, I gave you a defaultdict because you asked for one. – inspectorG4dget Aug 29 '13 at 21:18
  • What about performance, is it slower or faster if you make it all in one line? – badc0re Aug 29 '13 at 21:20
  • 1
    @badc0re: I haven't run `timeit`s, but I suspect that `sorted` would be slower than `sort` (time for `malloc`); the list comprehension would be faster than the equivalent for loop (I `timeit`'d this a few years ago when I was interested in sort of thing), presumably because the interpreter is able to accurately predict the size of the list – inspectorG4dget Aug 29 '13 at 21:23
  • @inspectorG4dget: Given that Python almost always has more than enough free memory around to create a 5-element list (it starts off with a few hundred MB on most platforms, and keeps an even larger freelist if you go beyond that), I doubt there are any `malloc` costs at all. – abarnert Aug 29 '13 at 21:46
  • @abarnert: (honest question) won't `malloc` for a list have a cost anyway, though? Won't this be avoided with `sort` (as opposed to `sorted`)? If you were talking about the list comprehension, I would have to use the scalability argument, as I /have/ observed 10x speedups with list comprehensions in the past – inspectorG4dget Aug 29 '13 at 21:49
  • @inspectorG4dget: If you never call `malloc`, it doesn't have a cost. And that's the point; you almost never call `malloc`. (Of course you're right that a list comprehension is faster than an explicit loop around `append` or the like, because the core loop is happening in C instead of Python, a number of bits can be short-circuited, you can take advantage of the custom `LIST_APPEND` opcode, etc. But that's not what I was talking about.) – abarnert Aug 29 '13 at 22:00
  • @inspectorG4dget: It is true that `a.sort()` is usually faster than `sorted(a)`, but that usually has nothing to do with `malloc`; it's the fact that `sort` can rely on having a list, and that it can munge the list in-place, meaning it can use fast-indexing, swap and shift values directly, etc. For tiny and huge lists this doesn't help as much, but for mid-sized ones it can be an order of magnitude faster (to the point that people sometimes suggest that `sorted(x)` just do `list(x); x.sort(); return x`). – abarnert Aug 29 '13 at 22:02
  • @abarnert: please explain that speedup on mid-sized lists some more. I always thought that a large factor of the speedup came from `sorted` having to malloc more space for the new copy of the list (since the core C code under python's hood would have to `malloc` for a new list, no?) – inspectorG4dget Aug 29 '13 at 22:04
  • @inspectorG4dget: I don't know how else to explain it, but I'll try. CPython has a custom allocator (actually more than one, but let's ignore that). When some higher-level code wants to create a new `list` object, or a new buffer for the array used by the `list`, it doesn't call `malloc`, it just asks the allocator. Usually this just means reusing a an object or buffer that was recently released, or was pre-allocated at startup. Of course the allocator _may_ be forced to call `malloc` (or something else) if it runs out of space, but it rarely has to. – abarnert Aug 29 '13 at 22:14
  • @abarnert: dude that's awesome! I'd love for you to chime in on [this question I just posted as a result of this conversation](http://stackoverflow.com/q/18522574/198633) – inspectorG4dget Aug 29 '13 at 23:31
3

I think most of this is a lot of smoke and mirrors to avoid a simple for loop:

di={}
for k,v in [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2),('b', 3)]:
    di.setdefault(k,[]).append(v)
# di={'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]}

If your goal is one line and you want abusive syntax that I cannot at all endorse or support you can use a side effect comprehension:

>>> li=[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 2),('b', 3)]
>>> di={};{di.setdefault(k[0],[]).append(k[1]) for k in li}
set([None])
>>> di
{'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]}

If you really want to go overboard into the unreadable:

>>> {k1:[e for _,e in v1] for k1,v1 in {k:filter(lambda x: x[0]==k,li) for k,v in li}.items()}
{'a': [1, 2], 'c': [3], 'b': [2, 3], 'd': [4]}

You don't want to do that. Use the for loop Luke!

dawg
  • 98,345
  • 23
  • 131
  • 206
2
>>> kvs = [(1,2), (2,3), (1,3)]
>>> reduce(
...   lambda d,(k,v): d[k].append(v) or d,
...   kvs,
...   defaultdict(list))
defaultdict(<type 'list'>, {1: [2, 3], 2: [3]})
user471651
  • 113
  • 1
  • 7