1

Suppose I’ve got a list l = [1,2,3] and I want to create a set of all numbers in that list and their squares. Ideally, in a single comprehension expression.

Best I can come up with is (two iterations over the list):

set(_ for _ in l).union(_ * _ for _ in l)
Jens
  • 8,423
  • 9
  • 58
  • 78
  • It is clearer than a nested generator comprehension as I think. BTW, it can be simplified to `{_ for _ in l} | {_ ** 2 for _ in l}` – Sraw Dec 14 '17 at 01:56
  • @Sraw That’s synonymous with the explicit `union()` function and still requires two iterations over the list. – Jens Dec 14 '17 at 03:04
  • I know, Um.. It is difficult to explain. There was an answer that gave a nested generator comprehension and I though that looked confusing although it was really shorter. But now that answer has been deleted! Surely what I write is a synonym, I just want to say it can be shorter and even clearer. – Sraw Dec 14 '17 at 03:13
  • Actually I find a solution: `{*(l + [_ ** 2 for _ in l])}`. – Sraw Dec 14 '17 at 03:22
  • @Sraw Still iterates twice over `l` ;-) Take a look at the answer below. – Jens Dec 14 '17 at 03:36
  • That answer is the same as the deleted answer. – Sraw Dec 14 '17 at 03:45
  • As an aside, `_` is conventionally used for *throwaway* variables, e.g. `[1 for _ in range(5)]`, it will be confusing to other Python programmers if you use something like `[_ for _ in l]`. Furthermore, `set(x for x in l)` => `set(l)`. – juanpa.arrivillaga Dec 14 '17 at 03:47
  • Actually, as I has tested, `set(l + [i ** 2 for i in l])` is faster than `{y for x in l for y in (x, x**2)} `. Using `timeit` with 1000000 iterations. – Sraw Dec 14 '17 at 03:50
  • @Sraw Interesting, could you please make your observation an additional answer? Thanks! – Jens Dec 14 '17 at 03:53

2 Answers2

4

Your own code can be shortened to:

set(l).union(x**2 for x in l)

in which I renamed _ to x, because _ indicates the value is not important, but it is.

Strictly speaking you're still iterating over the list twice, but the first time implicitly.

If you insist to iterate once, you'd get this:

{y for x in l for y in (x, x**2)}

which is a double comprehension that encompasses the following:

result = set()
for x in l:
    for y in (x, x**2):
        result.add(y)
Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
0

IMO, set(l + [i ** 2 for i in l]) is a better solution. It is clearer than a nested generator comprehension.

And I have done a benchmark:

import timeit
l = list(range(5))
print(timeit.timeit("set(l + [_ ** 2 for _ in l])", 'from __main__ import ' + ', '.join(globals())))
print(timeit.timeit("{y for x in l for y in (x, x**2)}", 'from __main__ import ' + ', '.join(globals())))

output:

3.0309128219996637
3.1958301850008866

It shows set(l + [i ** 2 for i in l]) is a little faster. The reason I think is that nested generator comprehension need to create internal object (x, x**2) for every loop, that makes it slow.

Update

import timeit
l = list(range(200000))
print(timeit.timeit("set(l + [_ ** 2 for _ in l])", 'from __main__ import ' + ', '.join(globals()), number=100))
print(timeit.timeit("{y for x in l for y in (x, x**2)}", 'from __main__ import ' + ', '.join(globals()), number=100))

output:

16.46792753900081
19.72252997099895
Sraw
  • 18,892
  • 11
  • 54
  • 87
  • For a list this small, the results are meaningless. When it grows large, the fact that you're constructing the whole list first _could_ become a factor. More interesting would be a benchmark with a list of 100000+ elements. – Thijs van Dien Dec 14 '17 at 04:13
  • @ThijsvanDien Updated. – Sraw Dec 14 '17 at 04:20