(Set) Comprehension from multiple values

Question

Suppose I’ve got a list l = [1,2,3] and I want to create a set of all numbers in that list and their squares. Ideally, in a single comprehension expression.

Best I can come up with is (two iterations over the list):

set(_ for _ in l).union(_ * _ for _ in l)

It is clearer than a nested generator comprehension as I think. BTW, it can be simplified to `{_ for _ in l} | {_ ** 2 for _ in l}` — Sraw, Dec 14 '17 at 01:56
@Sraw That’s synonymous with the explicit `union()` function and still requires two iterations over the list. — Jens, Dec 14 '17 at 03:04
I know, Um.. It is difficult to explain. There was an answer that gave a nested generator comprehension and I though that looked confusing although it was really shorter. But now that answer has been deleted! Surely what I write is a synonym, I just want to say it can be shorter and even clearer. — Sraw, Dec 14 '17 at 03:13
@Sraw Still iterates twice over `l` ;-) Take a look at the answer below. — Jens, Dec 14 '17 at 03:36
As an aside, `_` is conventionally used for *throwaway* variables, e.g. `[1 for _ in range(5)]`, it will be confusing to other Python programmers if you use something like `[_ for _ in l]`. Furthermore, `set(x for x in l)` => `set(l)`. — juanpa.arrivillaga, Dec 14 '17 at 03:47
Actually, as I has tested, `set(l + [i ** 2 for i in l])` is faster than `{y for x in l for y in (x, x**2)} `. Using `timeit` with 1000000 iterations. — Sraw, Dec 14 '17 at 03:50
@Sraw Interesting, could you please make your observation an additional answer? Thanks! — Jens, Dec 14 '17 at 03:53

score 4 · Accepted Answer · answered Dec 14 '17 at 03:21

4

Your own code can be shortened to:

set(l).union(x**2 for x in l)

in which I renamed _ to x, because _ indicates the value is not important, but it is.

Strictly speaking you're still iterating over the list twice, but the first time implicitly.

If you insist to iterate once, you'd get this:

{y for x in l for y in (x, x**2)}

which is a double comprehension that encompasses the following:

result = set()
for x in l:
    for y in (x, x**2):
        result.add(y)

answered Dec 14 '17 at 03:21

Thijs van Dien

6,516
1
29
48

That’s that funky double-nested comprehension that I was trying for – Jens Dec 14 '17 at 03:24
@Jens The order of the `for`s is always a gotcha. – Thijs van Dien Dec 14 '17 at 03:26

Sraw · Answer 2 · 2017-12-14T04:20:47.517

IMO, set(l + [i ** 2 for i in l]) is a better solution. It is clearer than a nested generator comprehension.

And I have done a benchmark:

import timeit
l = list(range(5))
print(timeit.timeit("set(l + [_ ** 2 for _ in l])", 'from __main__ import ' + ', '.join(globals())))
print(timeit.timeit("{y for x in l for y in (x, x**2)}", 'from __main__ import ' + ', '.join(globals())))

output:

3.0309128219996637
3.1958301850008866

It shows set(l + [i ** 2 for i in l]) is a little faster. The reason I think is that nested generator comprehension need to create internal object (x, x**2) for every loop, that makes it slow.

Update

import timeit
l = list(range(200000))
print(timeit.timeit("set(l + [_ ** 2 for _ in l])", 'from __main__ import ' + ', '.join(globals()), number=100))
print(timeit.timeit("{y for x in l for y in (x, x**2)}", 'from __main__ import ' + ', '.join(globals()), number=100))

output:

16.46792753900081
19.72252997099895

For a list this small, the results are meaningless. When it grows large, the fact that you're constructing the whole list first _could_ become a factor. More interesting would be a benchmark with a list of 100000+ elements. — Thijs van Dien, Dec 14 '17 at 04:13

(Set) Comprehension from multiple values

2 Answers2

Update