113

I would like to know why this is valid:

set(range(10)) - set(range(5))

but this is not valid:

set(range(10)) + set(range(5))

Is it because '+' could mean both intersection and union?

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
badzil
  • 3,440
  • 4
  • 19
  • 27
  • 7
    `|` means union. What are you asking? – S.Lott Oct 07 '11 at 20:03
  • 16
    It's because Guido chose different operators for intersection and union. – David Heffernan Oct 07 '11 at 20:06
  • 4
    @David Heffernan, Guido doesn't usually do things without a reason or at least some guiding principle - that's what makes Python so great. – Mark Ransom Oct 07 '11 at 20:11
  • 1
    If only `~` were a binary operator, then you could have `|` for + union, and `~` for difference, which is much more balanced. – Matt Joiner Oct 08 '11 at 08:00
  • 1
    There is a long standing Python issue that `set.__or__` does not have an appropriate "Does same as set.union()..." docstring, nor do the other operators. It causes unnecessary questions but the process is moribund. :( – Charles Merriam Jan 13 '20 at 00:44

6 Answers6

139

Python sets don't have an implementation for the + operator.

You can use | for set union and & for set intersection.

Sets do implement - as set difference. You can also use ^ for symmetric set difference (i.e., it will return a new set with only the objects that appear in one set but do not appear in both sets).

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
114

Python chose to use | instead of + because set union is a concept that is closely related to boolean disjunction; Bit vectors (which in python are just int/long) define this operation across a sequence of boolean values and call it "bitwise or". In fact this operation is so similar to the set union that binary integers are sometimes also called "Bit sets", where the elements in the set are taken to be the natural numbers.

Because int already defines set-like operators as |, & and ^, it was natural for the newer set type to use the same interface.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
48

In set theory the + symbol normally indicates the disjoint union of two sets. If A and B are sets, their disjoint union is defined to be the set

A + B = {(a, 1) | a in A} U {(b, 2) | b in B}

i.e., to construct the disjoint union, we mark all elements of A and all elements of B with different tags (in the example I used the numbers 1 and 2, but any two different "things" would do the job) and then take the union of the two resulting sets. In the above example, I have used 'U' for set union to make it more similar to the usual mathematical notation; below I use the Python notation, i.e. '|' for union, and '&' for intersection.

If A and B are disjoint, the A + B has a 1-to-1 correspondence with A | B. If they are not, then all common elements x in A & B appear twice in A + B: once as (x, 1), and once as (x, 2).

So, since the '+' symbol has a quite well-established meaning as a set operation, I find it very consistent that Python does not use this symbol for set union or intersection. Probably Python designer(s) had this in mind when they chose set operators.

Giorgio
  • 5,023
  • 6
  • 41
  • 71
  • 10
    **This is the optimal answer.** Until reading this response, I grokked why Guido overloaded the `|` operator for set unions but failed to grok why Guido avoided overloading the `+` operator for set unions as well. After all, doing so would have preserved orthogonality with the `+` operator overloaded for list additions. Since Python's hallmark is conformance with mathematical notation (e.g., `j` denoting the complex component of complex numbers), Guido's curious choice finally makes sense. – Cecil Curry Feb 07 '18 at 07:29
24

Sure, they could have used + to do a union, but then would still need a symbol for intersection. | for union is symmetrical with & for intersection and thus makes a better choice.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
13

Because | means union and & means intersection. There's clearly no reason to add multiple operators for the same function.

The reasons for using | and & probably goes back to bitwise operations. If you represent a set as the bits in a number, those are the operators you'd use to do union and intersect.

+ simple isn't as tied to union and - is to set difference.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
5

Because set difference is a very useful and commonly known concept, but there's no (universally used) concept of „set addition“.

Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81
  • 2
    Union? When was the last time you heard somebody say „set addition“ instead of „union“, or use + instead of ∪?. Sometimes `+` is defined as [member-wise addition](http://goo.gl/ZIm6y). Some use it for [symmetric difference](http://www.cut-the-knot.org/do_you_know/add_set.shtml). Either way, any paper that uses it either calls it something else or defines it first. – Petr Viktorin Oct 08 '11 at 10:28
  • 2
    Someone might refer to it as 'set addition' if they don't know the proper term. Obviously people who know the term 'union' use the term 'union.' – fluffy Oct 18 '11 at 17:40