Python theory - List comparison without len function

Question

I have two lists:

a = ['computador', 'caderno', 'lapiseira', 'caneta', 'cadeira', 'mesa']
b = ['computador', 'celular', 'café', 'água']

When I try to to compare list length with if condition, the result is exactly what I expected: list a is greater than list b.

if len(a) > len(b):
  print('a > b')
else:
  print('a < or = b')

The output is "a > b".

I tried the same comparison without len and the result is the opposite: "a < or = b"

if a > b:
  print('a > b')
else:
  print('a < or = b')

What is python comparing in the second case?

what does `(1,2,3) > (1,2,3,4)` do? classes / built ins overload the < > >= <= operators, look up `__gt__(self, other)` etc. f.e. here: https://docs.python.org/3/reference/datamodel.html?highlight=__gt__#object.__gt__ — Patrick Artner, Jul 04 '20 at 16:02
Elements of same index in both lists. As soon as one is greater than the other comparison is done (similar logic as lexical comparison of strings character by character). — Michael Butscher, Jul 04 '20 at 16:03

Roshin Raphel · Answer 1 · 2020-07-04T16:10:48.287

When you use:

if a > b:
  print('a > b')
else:
  print('a < or = b')

The lists are compared elements wise. First a[0] and b[0] is compared, then a[1] and b[1]. While comparing these two elements, as there are strings, they are treated as lists and the same process is done. For comparison of strings, their ASCII numerical value is used.

So when you use a > b, first 'computador' and 'computador' are compared, since both are same, 'caderno' and 'celular' are compared. In this case, first character is same, c, so the interpreter checks the second characters, a and e. SInce e is greater numerically, the condition is true, print('a > b') is executed.

revliscano · Answer 2 · 2020-07-04T16:53:18.160

When using the comparison operators <, <=, >, >= on two iterables* (at least for strings, lists ans tuples), Python do not compare its lengths but compares them in a lexicographical order. It means that it compares each item of one iterable with the item of the other that is in the same position. So in your example Python compares:

'computador' > 'computador' (false since they are the same)
'caderno' > 'celular' (false because the letter 'a' of caderno comes first than the letter 'e' of celular)
'lapiseira' > 'café' (true because the 'l' of lapiseira is greater of the 'c' of café)
'caneta' > 'água' (false because the 'c' of caneta is not greater than the 'á' (note the accent mark) of água)

Python actually only does the comparisons until it is no longer truth, so it stops comparing in 'computador' > 'computador'. I just put the four of them to illustrate how it works.

So for the comparison a > b it returns False

(*) Not every iterable supports comparisons using the operators mentioned above. They must be implemented to be used like that. Python lets you use this behavior by default (as far as I know) only in strings, tuples and lists.

With set instances, for example, it has a completely different behavior

This is not generally true for iterables - `range(3) > range(4)` results in an error. — tdelaney, Jul 04 '20 at 16:26
Clarified that it works like that at least for tuples, lists and strings. @tdelaney — revliscano, Jul 04 '20 at 16:35
Yes, but that's because those objects implemented comparison operations. I think your explanation is good, just too broad. If you can change the blanket statement on iterables, then you'd have it. — tdelaney, Jul 04 '20 at 16:37

tdelaney · Answer 3 · 2020-07-04T16:20:18.617

In the second case, python is doing an element by element compare and 'celular' is greater than 'caderno'. Had the two lists been equal up to the last element, then the shorter length list would be less. Something is greater than nothing.

Strings are like lists but elementwise comparison is done on each character's ordinal number. This may follow well-accepted ordering rules in a language but is not guaranteed to. For instance 'ä' is greater than 'b'.

Further, unicode characters can be expressed in more than one way. unicodedata.normalize converts multi-character sequences to a normalized form for comparison. We should probably always do that when comparing text strings but since we don't usually bump into this case, the bug is not obvious.

Python theory - List comparison without len function

3 Answers3