Python descending order of wordcount

Question

I'm using this code to count the frequency of word appearance in a text file:

#!/usr/bin/python
file=open("out1.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for k,v in wordcount.items():
    print k, v

How can I print the output in the descending order of frequency numbers?

As a side note, don't forget to close the file. I suggest using the `with` statement if possible. — Cristian Ciupitu, Apr 20 '14 at 00:21

score 7 · Answer 1 · answered Apr 20 '14 at 00:25

Use Counter.most_common without specifying a value to get a descending list of word frequencies.

from collections import Counter

word_count = Counter()

with open("out1.txt","r+") as file:
    word_count.update((word for word in file.read().split()))

for word, count in word_count.most_common():
    print word, count

>>> the 6
Lorem 4
of 4
and 3
Ipsum 3
text 2
type 2

score 2 · Answer 2 · answered Apr 20 '14 at 00:17

You can create a list of tuples and sort that. Here's an example.

wordcount = {'cat':1,'dog':2,'kangaroo':20}

ls = [(k,v) for (k,v) in wordcount.items()]

ls.sort(key=lambda x:x[1],reverse=True)

for k,v in ls:
    print k, v

...outputs...

kangaroo 20
dog 2
cat 1

A.J. Uppal · Accepted Answer · 2014-04-20T00:36:53.647

2

Here is the code:

file=open("out1.txt","r+")
wordcount={}
for word in file.read().split():
    word = word.lower()
    if word.isalpha == True:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1
copy = []
for k,v in wordcount.items():
    copy.append((v, k))


copy = sorted(copy, reverse=True)

for k in copy:
        print '%s: %d' %(k[1], k[0])

Out1.txt:

hello there I am saying hello world because Bob is here and I am saying hello because John is here

Runs as:

hello: 3
saying: 2
is: 2
here: 2
because: 2
am: 2
I: 2
world: 1
there: 1
and: 1
John: 1
Bob: 1

edited Apr 20 '14 at 00:36

answered Apr 20 '14 at 00:22

A.J. Uppal

19,117
6
45
76

@loop_digga Note that this won't work if you want it to be case-insensitive or want to include words that end with periods, etc. I'm sure aj8uppal knows how to alter his code to make it work for that, but I thought you should know so you don't come back confused as to why it's not counting "This this" as 2 "this" words. – Mdev Apr 20 '14 at 00:31
@Human thanks, will check into that! – bcrvc Apr 20 '14 at 00:33
1

I will edit my code to fix that, thanks! – A.J. Uppal Apr 20 '14 at 00:34
if word.isalpha == True: : What is this suppose to check for? – Manish Ranjan Feb 17 '16 at 20:20

Mdev · Answer 4 · 2014-04-20T00:38:58.060

Use the Counter module.

from collections import Counter

s = "This is a sentence this is a this is this"

c = Counter(s.split())
#s.split() is an array of words, it splits it at each space if no parameter is given to split on

print c

>>> Counter({'is': 3, 'this': 3, 'a': 2, 'This': 1, 'sentence': 1})

This won't work 'correctly' with periods and capital letters though. You can simply remove periods at the end of words to count it correctly and make everything lower/uppercase to make it case-insensitive as well.

You can get rid of both of those problems with this:

s1 = "This is a sentence. This is a. This is. This."
s2 = ""

for word in s1.split():
    #punctuation checking, you can make this more robust through regex if you want
    if word.endswith('.') or word.endswith('!') or word.endswith('?'):
        s2 += word[:-1] + " "
    else:
        s2 += word + " "

c = Counter(s2.lower().split())

print c

>>> Counter({'this': 4, 'is': 3, 'a': 2, 'sentence': 1})

Python descending order of wordcount

4 Answers4