How 'greater than or equal' comparison between sets with characters work

Question

I was looking through Numpy source code here, and I found the following piece of code, which ensures that the user is not passing both 'C' and 'F' to the requirements argument.

if requirements >= {'C', 'F'}:
    raise ValueError('Cannot specify both "C" and "F" order')

I was intrigued about how one could use >= to make such a check. I found this answer on SO, and this in the python docs. From these I learned that using the lexical order, it first compares the two first elements, if they are equal, look at the next element, if they are different, this determines which is the lesser.

When I try this with an example that the user passes the arguments 'A', 'C', 'F', list or tuple works as expected, but when I try with a set, I get a different result:

['A', 'C', 'F'] >= ['C', 'F']  # False
('A', 'C', 'F') >= ('C', 'F')  # False
{'A', 'C', 'F'} >= {'C', 'F'}  # True

I tried to research lexical ordering specifically for sets, but since set is also used in the "mathematical" sense, most examples and articles I find use lists. So why is the comparison between sets different?

Because it's explicitly implemented as `issuperset`: https://docs.python.org/3/library/stdtypes.html#frozenset.issuperset. The answer for lists is irrelevant because sets don't have a "first element", they're not semantically ordered data structures, and likewise the Python docs you link are for _sequences_. — jonrsharpe, Sep 22 '21 at 08:16
That makes sense, I didn't think of set as not being a sequence. I did look at that page, but I think I got confused with the lexical ordering that the other links discussed. — Karl, Sep 22 '21 at 11:02

Lennart Regebro · Accepted Answer · 2021-09-22T08:39:10.267

For sets >= simply means that it has the same or more items. Ordering doesn't come into it. So both these comparisons are true:

{'C', 'F'} == {'C', 'F'}
{'F', 'C'} == {'C', 'F'}

These sets are all equal. Add a character, and they are no longer equal:

{'A', 'C', 'F'} != {'C', 'F'}

The first set is now instead seen as "larger than", because it contains all of the right hand set, and more:

{'A', 'C', 'F'} > {'C', 'F'}

Again, order make no difference:

{'F', 'C', 'A'} > {'C', 'F'}

This means that greater than or equal will match all sets with all the characters, even if there are additional characters:

{'C', 'F'} >= {'C', 'F'}
{'A', 'C', 'F'} >= {'C', 'F'}
{'F', 'C', 'A'} >= {'C', 'F'}

But remove 'C' or 'F', and it's no longer equal nor larger than. In fact, if it contains only 'C' or 'F' it will be seen as "less than":

{'C'} < {'C', 'F'}
{'F'} < {'C', 'F'}

But add other characters, and it will no longer be equal, or larger than, or smaller than. All of these will be False

{'A', 'B', 'Q'} == {'C', 'F'}
{'A', 'B', 'Q'} > {'C', 'F'}
{'A', 'B', 'Q'} < {'C', 'F'}

{'A', 'C', 'Q'} == {'C', 'F'}
{'A', 'C', 'Q'} > {'C', 'F'}
{'A', 'C', 'Q'} < {'C', 'F'}

Only not equals will be true:

{'A', 'B', 'Q'} != {'C', 'F'}
{'A', 'C', 'Q'} != {'C', 'F'}

So that means that requirements >= {'C', 'F'} will return true if requirements contain both 'C' and 'F', with no regards to order, and also if there are more items than 'C' and 'F', but it will not be true if requirements contains only one or neither of 'C' or 'F'.

score 2 · Answer 2 · answered Sep 22 '21 at 08:40

The data within a set are unordered. That is a property of hash tables being referred to as "Unordered associative array" since the order of elements stored differs based on the hashing technique used e.g. "b" might be placed on a memory after "a" in one run and could be reversed the next run, unlike arrays/lists/tuples which would be stored/accessed in the same order as how it was defined, regardless of machine.

To visualize this, create a script:

print({"a", "b", "c"})

Then try running it 3 times:

$ python3 script.py 
{'c', 'a', 'b'}
$ python3 script.py 
{'b', 'a', 'c'}
$ python3 script.py 
{'a', 'b', 'c'}

As you can see, relying on the lexical order of the set to compare per element would be very unreliable since given the exact same input, the order will differ every run.

Lastly, you might want to look at the documentation of set. As you can see, it explicitly defines how the comparison operators would behave e.g. <= means issubset. Thus the main reason why the comparison of sets are different is because of such overridden methods.

issubset(other)

set <= other

Test whether every element in the set is in other.

It doesn't indicate anything about lexical order. All it tells is that it will check if the elements in one is in another, without regard to order.
This is just one comparison method. Refer to the docs for the complete list.

score 0 · Answer 3 · answered Sep 22 '21 at 09:11

In the context of sets, > is simply used to indicate a (proper) superset and < for a (proper) subset.

set([1, 2]) <= set([1, 2, 3]) 
# True
set([1, 2]) < set([1, 2]) 
# False

Also, as per the docs -

The subset and equality comparisons do not generalize to a total ordering function. For example, any two nonempty disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b.

How 'greater than or equal' comparison between sets with characters work

3 Answers3

`issubset(other)`

`set <= other`