16

I have a list of dictionaries like the following:

lst = [{'a': 5}, {'b': 6}, {'c': 7}, {'d': 8}]

I wrote a generator expression like:

next((itm for itm in lst if itm['a']==5))

Now the strange part is that though this works for the key value pair of 'a' it throws an error for all other expressions the next time. Expression:

next((itm for itm in lst if itm['b']==6))

Error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
KeyError: 'b'
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
Apurva Kunkulol
  • 445
  • 4
  • 17

5 Answers5

32

That's not weird. For every itm in the lst. It will first evaluate the filter clause. Now if the filter clause is itm['b'] == 6, it will thus try to fetch the 'b' key from that dictionary. But since the first dictionary has no such key, it will raise an error.

For the first filter example, that is not a problem, since the first dictionary has an 'a' key. The next(..) is only interested in the first element emitted by the generator. So it never asks to filter more elements.

You can use .get(..) here to make the lookup more failsafe:

next((itm for itm in lst if itm.get('b',None)==6))

In case the dictionary has no such key, the .get(..) part will return None. And since None is not equal to 6, the filter will thus omit the first dictionary and look further for another match. Note that if you do not specify a default value, None is the default value, so an equivalent statement is:

next((itm for itm in lst if itm.get('b')==6))

We can also omit the parenthesis of the generator: only if there are multiple arguments, we need these additional parenthesis:

next(itm for itm in lst if itm.get('b')==6)
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • 3
    maybe just `itm.get('b') == 6` (`None` is the default anyway) – Chris_Rands Jul 05 '17 at 08:15
  • @Chris_Rands: yes, but the aim was to make the `None` explicit here. Since otherwise one wonders where the `None` originates from. But I will add it to the answer :). – Willem Van Onsem Jul 05 '17 at 08:17
  • @WillemVanOnsem Thanks for that descriptive answer. I have another question though. Since there is a for loop in the expression, I was expecting that if a mismatch occurs, the expression will take the next element in the list. Why does that not happen with "d[x]" and happens with d.get("x") – Apurva Kunkulol Jul 05 '17 at 08:45
  • 3
    @ApurvaKunkulol: because the first one results in an error. If code raises an error, the execution flow is aborted, and the call stack is *unrolled* until there is a catching mechanism that deals with the error. In case of `d.get('x')`, there is no such error. Since if the key is missing, `None` is returned. This thus will let the normal code path continue which is fetching the next `itm` and check the filter on that `itm`. – Willem Van Onsem Jul 05 '17 at 08:47
15

Take a look at your generator expression separately:

(itm for itm in lst if itm['a']==5)

This will collect all items in the list where itm['a'] == 5. So far so good.

When you call next() on it, you tell Python to generate the first item from that generator expression. But only the first.

So when you have the condition itm['a'] == 5, the generator will take the first element of the list, {'a': 5} and perform the check on it. The condition is true, so that item is generated by the generator expression and returned by next().

Now, when you change the condition to itm['b'] == 6, the generator will again take the first element of the list, {'a': 5}, and attempt to get the element with the key b. This will fail:

>>> itm = {'a': 5}
>>> itm['b']
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    itm['b']
KeyError: 'b'

It does not even get the chance to look at the second element because it already fails while trying to look at the first element.

To solve this, you have to avoid using an expression that can raise a KeyError here. You could use dict.get() to attempt to retrieve the value without raising an exception:

>>> lst = [{'a': 5}, {'b': 6}, {'c': 7}, {'d': 8}]
>>> next((itm for itm in lst if itm.get('b') == 6))
{'b': 6}
poke
  • 369,085
  • 72
  • 557
  • 602
6

Obviously itm['b'] will raise a KeyError if there is no 'b' key in a dictionary. One way would be to do

next((itm for itm in lst if 'b' in itm and itm['b']==6))

If you don't expect None in any of the dictionaries then you can simplify it to

next((itm for itm in lst if itm.get('b')==6))

(this will work the same since you compare to 6, but it would give wrong result if you would compare to None)

or safely with a placeholder

PLACEHOLDER = object()
next((itm for itm in lst if itm.get('b', PLACEHOLDER)==6))
freakish
  • 54,167
  • 9
  • 132
  • 169
1

Indeed, your structure is a list of dictionaries.

>>> lst = [{'a': 5}, {'b': 6}, {'c': 7}, {'d': 8}]

To get a better idea of what is happening with your first condition, try this:

>>> gen = (itm for itm in lst if itm['a'] == 5)
>>> next(gen)
{'a': 5}
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
KeyError: 'a'

Each time you call next, you process the next element and return an item. Also...

next((itm for itm in lst if itm['a'] == 5))

Creates a generator that is not assigned to any variable, processes the first element in the lst, sees that key 'a' does indeed exist, and return the item. The generator is then garbage collected. The reason an error is not thrown is because the first item in lst does indeed contain this key.

So, if you changed the key to be something that the first item does not contain, you get the error you saw:

>>> gen = (itm for itm in lst if itm['b'] == 6)
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
KeyError: 'b'

The Solution

Well, one solution as already discussed is to use the dict.get function. Here's another alternative using defaultdict:

from collections import defaultdict
from functools import partial

f = partial(defaultdict, lambda: None)

lst = [{'a': 5}, {'b': 6}, {'c': 7}, {'d': 8}]
lst = [f(itm) for itm in lst] # create a list of default dicts

for i in (itm for itm in lst if itm['b'] == 6):
    print(i)

This prints out:

defaultdict(<function <lambda> at 0x10231ebf8>, {'b': 6})

The defaultdict will return None in the event of the key not being present.

cs95
  • 379,657
  • 97
  • 704
  • 746
0

Maybe you can try this:

next(next((itm for val in itm.values() if val == 6) for itm in lst))

This may be a little tricky, it generate two-tier generator, thus you need two next to get the result.

Hou Lu
  • 3,012
  • 2
  • 16
  • 23