13

I have a list of words:

words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']

And I want to get a list of tuples:

[(3, 'all'), (2, 'yeah'), (1, 'bye'), (1, 'awesome')]

where each tuple is...

(number_of_occurrences, word)

The list should be sorted by the number of occurrences.

What I've done so far:

def popularWords(words):
    dic = {}
    for word in words:
        dic.setdefault(word, 0)
        dic[word] += 1
    wordsList = [(dic.get(w), w) for w in dic]
    wordsList.sort(reverse = True)
    return wordsList

The question is...

Is it Pythonic, elegant and efficient? Are you able to do it better? Thanks in advance.

Charles
  • 50,943
  • 13
  • 104
  • 142
Maciej Ziarko
  • 11,494
  • 13
  • 48
  • 69

3 Answers3

16

You can use the counter for this.

import collections
words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']
counter = collections.Counter(words)
print(counter.most_common())
>>> [('all', 3), ('yeah', 2), ('bye', 1), ('awesome', 1)]

It gives the tuple with reversed columns.

From the comments: collections.counter is >=2.7,3.1. You can use the counter recipe for lower versions.

SiggyF
  • 22,088
  • 8
  • 43
  • 57
  • 1
    Only in Python 2.7+ or 3.1+. Neither are widely used yet, so it's worth mentioning. – Kenan Banks Mar 09 '11 at 00:01
  • You can use [this recipe](http://code.activestate.com/recipes/576611-counter-class/) if you still use an older python version. I think it gives the same interface. – SiggyF Mar 09 '11 at 00:10
  • My tuples have reversed columns just because it was simpler for me to sort it. :) I like you answer as it is very elegant and I don't mind using Python 2.7. – Maciej Ziarko Mar 09 '11 at 00:21
6

The defaultdict collection is what you are looking for:

from collections import defaultdict

D = defaultdict(int)
for word in words:
    D[word] += 1

That gives you a dict where keys are words and values are frequencies. To get to your (frequency, word) tuples:

tuples = [(freq, word) for word,freq in D.iteritems()]

If using Python 2.7+/3.1+, you can do the first step with a builtin Counter class:

from collections import Counter
D = Counter(words)
Kenan Banks
  • 207,056
  • 34
  • 155
  • 173
2

Is it Pythonic, elegant and efficient?

Looks good to me...

Are you able to do it better?

"better"? If it's understandable, and efficient, isn't that enough?

Maybe look at defaultdict to use that instead of setdefault.

Jason S
  • 184,598
  • 164
  • 608
  • 970