7

What is the more pythonic way of getting the length of the longest word:

len(max(words, key=len))

Or:

max(len(w) for w in words)

Or.. something else? words is a list of strings. I am finding I need to do this often and after timing with a few different sample sizes the first way seems to be consistently faster, despite seeming less efficient at face value (the redundancy of len being called twice seems not to matter - does more happen in C code in this form?).

wim
  • 338,267
  • 99
  • 616
  • 750
  • @isedev that'll give the word, not the length of the word – Jon Clements Feb 01 '13 at 00:27
  • 2
    Personally I prefer the latter, looks prettier – Wolph Feb 01 '13 at 00:28
  • 5
    ``len(max(words, key=len))`` is good because it serves as a *Oh I forgot ``max`` took key as an argument.* reminder for the wet-ware. – sotapme Feb 01 '13 at 00:31
  • In general python works well with english, so if you can read it and it's clear, you're probably good. 1. len max of words, or 2. max len of words. – monkut Feb 01 '13 at 00:32
  • @monkut: But the first is really more like "len max of words by len". – abarnert Feb 01 '13 at 01:02
  • @abarnert yeah, but you got key in there too, so maybe, "len max of words keyed by len". A little long for my taste, but it really comes down to what your team is more comfortable with. – monkut Feb 01 '13 at 01:45
  • @monkut: I guess my brain just skipped over the word `key`. But that just makes the point even stronger. I guess I probably should have said what that point _was_ instead of expecting people to read my mind… The first one is more verbose and repetitive in English, because it's more verbose and repetitive in terms of concepts. I think that's more of a downside than the "less boilerplate" is an upside, so I prefer the second version. – abarnert Feb 01 '13 at 01:51

6 Answers6

8

Although:

max(len(w) for w in words)

does kind of "read" easier - you've got the overhead of a generator.

While:

len(max(words, key=len))

can optimise away with the key using builtins and since len is normally a very efficient op for strings, is going to be faster...

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • 1
    That being said - I can't say which is more "Pythonic" - I like both, but for someone unfamiliar with the use of `max` with `key` perhaps the former is going to be more immediately grokkable – Jon Clements Feb 01 '13 at 00:36
5

I think both are OK, but I think that unless speed is a big consideration that max(len(w) for w in words) is the most readable.

When I was looking at them, it took me longer to figure out what len(max(words, key=len)) was doing, and I was still wrong until I thought about it more. Code should be immediately obvious unless there's a good reason for it not to be.

It's clear from the other posts (and my own tests) that the less readable one is faster. But it's not like either of them are dog slow. And unless the code is on a critical path it's not worth worrying about.

Ultimately, I think more readable is more Pythonic.

As an aside, this one of the few cases in which Python 2 is notably faster than Python 3 for the same task.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • In my tests, 3.3.0 beat 2.7.2 for every version I could come up with. (See my answer for the obvious ones.) – abarnert Feb 01 '13 at 01:00
  • Update: Actually, if I run them both in 32-bit mode, 3.3.0 is significantly slower. But then almost everything seems slow in 32-bit 3.2 or 3.3, at least on Macs, so I don't think there's anything specific to this case. – abarnert Feb 01 '13 at 01:01
  • @abarnert: Interesting. I ran them both in 64-bit mode on a Linux system. One was Python 2.7.3 and the other 3.3.0. I was using `/usr/share/dict/words` as the word list. I was getting speeds of 88ms vs 66ms. Perhaps it's my choice of a long word list that made the difference. – Omnifarious Feb 01 '13 at 01:28
  • @Omnifarous: Using 70000 words instead of 700, I get almost the exact same performance numbers multiplied by 100. Basically, 64-bit 3.3 is 8-14% faster than 64-bit 2.7, but 32-bit 3.3 is 0-10% slower than 32-bit 2.7 (and 32-bit and 64-bit 2.7 are within 2% of each other). (However, PyPy seems to do a lot better with 70000 than 700, as you might expect… I didn't include it in my answer because the machine I was testing on doesn't have ipython for pypy.) – abarnert Feb 01 '13 at 01:58
4

If you rewrite the generator expression as a map call (or, for 2.x, imap):

max(map(len, words))

… it's actually a bit faster than the key version, not slower.

python.org 64-bit 3.3.0:

In [186]: words = ['now', 'is', 'the', 'winter', 'of', 'our', 'partyhat'] * 100
In [188]: %timeit max(len(w) for w in words)
%10000 loops, best of 3: 90.1 us per loop
In [189]: %timeit len(max(words, key=len))
10000 loops, best of 3: 57.3 us per loop
In [190]: %timeit max(map(len, words))
10000 loops, best of 3: 53.4 us per loop

Apple 64-bit 2.7.2:

In [298]: words = ['now', 'is', 'the', 'winter', 'of', 'our', 'partyhat'] * 100
In [299]: %timeit max(len(w) for w in words)
10000 loops, best of 3: 99 us per loop
In [300]: %timeit len(max(words, key=len))
10000 loops, best of 3: 64.1 us per loop
In [301]: %timeit max(map(len, words))
10000 loops, best of 3: 67 us per loop
In [303]: %timeit max(itertools.imap(len, words))
10000 loops, best of 3: 63.4 us per loop

I think it's more pythonic than the key version, for the same reason the genexp is.

It's arguable whether it's as pythonic as the genexp version. Some people love map/filter/reduce/etc.; some hate them; my personal feeling is that when you're trying to map a function that already exists and has a nice name (that is, something you don't have to lambda or partial up), map is nicer, but YMMV (especially if your name is Guido).

One last point:

the redundancy of len being called twice seems not to matter - does more happen in C code in this form?

Think about it like this: You're already calling len N times. Calling it N+1 times instead is hardly likely to make a difference, compared to anything you have to do N times, unless you have a tiny number of huge strings.

abarnert
  • 354,177
  • 51
  • 601
  • 671
1

I'd say

len(max(x, key=len))

looks quite good because you utilize a keyword argument (key) of a built-in (max) with a built-in (len). So basically max(x, key=len) gets you almost the answer. But none of your code variants look particularly un-pythonic to me.

miku
  • 181,842
  • 47
  • 306
  • 310
0

Just for info using ipython %timeit

In [150]: words
Out[150]: ['now', 'is', 'the', 'winter', 'of', 'our', 'partyhat']

In [148]: %timeit max(len(w) for w in words)
100000 loops, best of 3: 1.87 us per loop

In [149]: %timeit len(max(words, key=len))
1000000 loops, best of 3: 1.35 us per loop

Just updated with more words to demonstrate @Omnifarious's point/comment.

In [160]: words = map(string.rstrip, open('/usr/share/dict/words').readlines())

In [161]: len(words)
Out[161]: 235886

In [162]: %timeit max(len(w) for w in words)
10 loops, best of 3: 44 ms per loop

In [163]: %timeit len(max(words, key=len))
10 loops, best of 3: 25 ms per loop
sotapme
  • 4,695
  • 2
  • 19
  • 20
-1

I know it's been a year now but neverthless, I came up with this:

'''Write a function find_longest_word() that takes a list of words and returns the length of the longest one.'''

a = ['mamao', 'abacate', 'pera', 'goiaba', 'uva', 'abacaxi', 'laranja', 'maca']

def find_longest_word(a):

    d = []
    for c in a:
        d.append(len(c))
        e = max(d)  #Try "min" :D
    for b in a:
        if len(b) == e:
            print "Length is %i for %s" %(len(b), b)
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331