17

I want to iterate over two lists. The first list contains some browser user-agents and the second list contains versions of those browsers. I want to filter out only those user-agents whose version is greater than 60.

Here is how my list comprehension looks:

[link for ver in version for link in useragents if ver > 60]

The problem with this list is that it prints same user-agent multiple times. I wrote the following using the zip function, which works fine:

for link, ver in zip(useragents, version):
    if ver > 60:
        # append to list
        print(link)

Why is my list comprehension returning unexpected results?

jpp
  • 159,742
  • 34
  • 281
  • 339
Viktor
  • 1,036
  • 1
  • 12
  • 25

5 Answers5

28

Your first list comprehension is equivalent to:

res = []
for ver in version:
    for link in useragents:
        if ver > 60:
            res.append(link)

Notice you have nested loop with time complexity O(n2), i.e. you are iterating over every combination of version and useragents. That's not what you want, assuming your version and useragents lists are aligned.

The equivalent of your for loop is the following list comprehension:

res = [link for link, ver in zip(useragents, version) if ver > 60]
jpp
  • 159,742
  • 34
  • 281
  • 339
9
[link for (link, ver) in zip(useragents, version) if ver > 60]

You still have to zip the two lists together.

shuebner
  • 91
  • 5
8

This

[link for ver in version for link in useragents if ver > 60]

is not the same as zip. It's not iterating through the two sequences in parallel. It's iterating through all combinations of those two sequences.

It is as if you wrote:

for ver in version:
    for link in useragents:
        if ver > 60:
            # append(link)

So if both sequences had length 5, there would be 25 combinations (some of which are filtered out by the condition ver > 60).

When you want to go through sequences in parallel, zip is the way to do it, even in a comprehension.

[link for (link, ver) in zip(useragents, version) if ver > 60]
khelwood
  • 55,782
  • 14
  • 81
  • 108
2

Alternatively you can use the function compress() in combination with map(), where you check some condition:

from itertools import compress

filter_ = map(lambda x: x > 60, version)
list(compress(useragents, filter_))

Example:

s = 'ABCDEFG'
nums = range(len(s))
    
filter_ = map(lambda x: x > 3, nums)
print(list(compress(s, filter_)))
# ['E', 'F', 'G']
Grisha Levit
  • 8,194
  • 2
  • 38
  • 53
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
1

Can't be sure on what's happening, without your data, but in general, "double" list comprehension is not the same as zip, but rather a double loop, i.e.

[a for b in bs for a in as]

is equivalent to

for b in bs:
    for a in as:
        lst.append(a)
Slam
  • 8,112
  • 1
  • 36
  • 44