4

I have a dictionary of keys and lists. I'd like to iterate through the dictionary, get each list, iterate through each list and apply a condition, then append that filtered list to a new dictionary.

The function already works imperatively. Can I do the same functionally with list and dict comprehensions? The main blocker is that the wrapping dict-comp has a conditional which needs length of the list-comp.

Here it is working imperatively:

filtered_prediction_dict = {}
for prediction, confidence_intervals in prediction_dict.items():
    filtered_confidence_intervals = []
    for i in confidence_intervals:
        if i > threshold:
            filtered_confidence_intervals.append(i)
    if len(filtered_confidence_intervals) >= 1:
        filtered_prediction_dict[prediction] = filtered_confidence_intervals

I was wondering if I could do the same thing functionally with comprehensions, something like this:

filtered_prediction_dict = {prediction: [i for i in confidence_intervals if i > threshold] for prediction, confidence_intervals in prediction_dict.items() if len(filtered_confidence_intervals) >= 1}

Of course, python's linter points out that filtered_confidence_intervals hasn't yet been defined in len(filtered_confidence_intervals) in the conditional.

Any way around this?

  • `"I was wondering if I could do the same thing functionally with comprehensions"` please don't, not if you want to understand your code 1 week from now – DeepSpace Sep 05 '19 at 08:19
  • Just make `filtered_confidence_intervals` a list comprehension, but leave the rest as is. – jonrsharpe Sep 05 '19 at 08:20
  • Were the answers somehow helpful? – j-i-l Sep 06 '19 at 11:06
  • Yes, I was hoping there was a meta way to avoid computing the list comprehension twice, but it seems unavoidable. The any() function is useful. – Hung-Ray Ho Sep 09 '19 at 06:08

2 Answers2

3

You can put the two conditions you apply on each of the confidence intervals in a single statement. Also, I recommend putting the filtering for confidence intervals in a list comprehension statement in any case.

The two conditions:

  1. confidence interval > threshold (the if i > threshold)
  2. one or more confidence intervals are are bigger than the threshold (the len(filtered_confidence_intervals) >= 1)

Expressed in a single statement:

  • any(ci > threshold for ci in confidence_intervals)

The resulting list-comprehension version (split up for readability):

{
    p: [ci for ci in cis if ci > threshold]  # only keep ci > threshold
    for p, cis in prediction_dict.items()  # iterate through the items
    if any(ci > threshold for ci in cis)  # only consider items with at least one ci > threshold
}

IMHO this is not less readable than for-loops, but I guess this is a matter of taste and use.


If you want to keep for-looping:

filtered_prediction_dict = {}
for prediction, confidence_intervals in prediction_dict.items():
    if any(ci > threshold for ci in confidence_intervals):
        filtered_prediction_dict[prediction] = [ci for ci in confidence_intervals if ci > threshold]

A note to your comment about the python's linter pointing out that filtered_confidence_intervals hasn't yet been defined:

Very often linters are quite accurate and this case is no exception. filtered_confidence_intervals is defined per item in prediction_dict so there is no way you can iterate through prediction_dict and have a test about the length of filtered_confidence_intervals.

You would need to replace the statement:

len(filtered_confidence_intervals) >= 1

in the list comprehension by

len([ci for ci in confidence_intervals if ci > threshold]) >= 1
j-i-l
  • 10,281
  • 3
  • 53
  • 70
2

you can use:

filtered_prediction_dict = {prediction: [i for i in confidence_intervals if i > threshold] for prediction, confidence_intervals in prediction_dict.items() if any(e >= threshold for e in  confidence_intervals)}

in this way you check that your filtered_prediction_dict doesn't have any empty list

or you can use:

filtered_prediction_dict = {prediction: [i for i in confidence_intervals if i > threshold] for prediction, confidence_intervals in prediction_dict.items() if max(confidence_intervals) >= threshold}

the second version iterate twice over each element from your lists, the first has some redundant iterations, but even so both solutions may be faster than using for statements

kederrac
  • 16,819
  • 6
  • 32
  • 55