Sorting and Organization of letter frequency - python

Question

I'm trying to find a way to count the number of occurrences of letters in a text file than display them in greatest to lowest depending upon there frequency. This is what I have so far, please help get over this brain block.

def me():
    info= input("what file would you like to select?")
    filehandle= open(info,"r")
    data=filehandle.read()
    case = data.upper()
    s=('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    for i in range(26):
        print(s[i],case.count(s[i]))



me()

No problem I am just looking at how I can take my out put and put it in highest to lowest occurences — Corey Quick, Sep 04 '13 at 02:19

Phillip Cloud · Answer 1 · 2013-09-04T01:59:02.250

Python has a nice built-in class for this: collections.Counter.

In [8]: from collections import Counter

In [9]: with open('Makefile', 'r') as f:
   ...:     raw = Counter(f.read())
   ...:

In [10]: raw
Out[10]: Counter({' ': 61, 'e': 46, 'p': 38, 'a': 29, '\n': 27, 'c': 27, 'n': 27, 'l': 26, 'd': 25, '-': 22, 's': 22, 'y': 22, 't': 20, 'i': 18, 'o': 18, 'r': 17, '.': 16, 'u': 13, '\t': 12, 'm': 12, 'b': 11, 'x': 10, 'h': 9, '/': 8, ':': 8, '_': 7, "'": 6, ';': 5, '\\': 5, 'f': 5, '*': 3, 'v': 3, '{': 3, '}': 3, 'k': 2, 'H': 1, 'O': 1, 'N': 1, 'P': 1, 'Y': 1, 'g': 1})

This is from the pandas library's Makefile, BTW. To sort them by their frequency in descending order, do:

In [22]: raw.most_common()
Out[22]:
[(' ', 61),
 ('e', 46),
 ('p', 38),
 ('a', 29),
 ('\n', 27),
 ('c', 27),
 ('n', 27),
 ('l', 26),
 ('d', 25),
 ('-', 22),
 ('s', 22),
 ('y', 22),
 ('t', 20),
 ('i', 18),
 ('o', 18),
 ('r', 17),
 ('.', 16),
 ('u', 13),
 ('\t', 12),
 ('m', 12),
 ('b', 11),
 ('x', 10),
 ('h', 9),
 ('/', 8),
 (':', 8),
 ('_', 7),
 ("'", 6),
 (';', 5),
 ('\\', 5),
 ('f', 5),
 ('*', 3),
 ('v', 3),
 ('{', 3),
 ('}', 3),
 ('k', 2),
 ('H', 1),
 ('O', 1),
 ('N', 1),
 ('P', 1),
 ('Y', 1),
 ('g', 1)]

I'm purposefully not using your exact data so that you can try and adapt my solution to your problem.

`raw.most_common()` returns a list of items sort from most common to least. So you can skip the sorting step. — Steven Rumbalski, Sep 04 '13 at 01:56

flornquake · Accepted Answer · 2013-09-04T02:13:03.550

This is exactly what collections.Counter and its most_common() method are for:

import collections
import string

def me():
    info = input("what file would you like to select? ")
    filehandle = open(info, "r")
    data = filehandle.read().upper()
    char_counter = collections.Counter(data)
    for char, count in char_counter.most_common():
        if char in string.ascii_uppercase:
            print(char, count)

me()

A Counter is a dictionary that counts the number of occurrences of different items (in this case, characters). char_counter.most_common() gives us all pairs of characters and counts in sorted order.

We're only interested in letters, so we check if the character is in string.ascii_uppercase. This is simply a string of letters from A to Z.

Thanks for the help flornquake could you better help me understand what you did here? — Corey Quick, Sep 04 '13 at 01:56

score 0 · Answer 3 · answered Sep 04 '13 at 01:22

This looks like a homework very very... much. And I wish you are properly using this website. However, glad you come to the right place and I would try to help you out at least this time.

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> input_txt = "Now you are just somebody that I used to know"
>>> for letter in input_txt:
...     d[letter] += 1
... 
>>> import operator
>>> sorted_d = sorted(d.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> sorted_d
[(' ', 9), ('o', 6), ('t', 4), ('e', 3), ('s', 3), ('u', 3), ('a', 2), ('d', 2), ('w', 2), ('y', 2), ('b', 1), ('I', 1), ('h', 1), ('k', 1), ('j', 1), ('m', 1), ('N', 1), ('r', 1), ('n', 1)]

score 0 · Answer 4 · answered Sep 04 '13 at 01:23

0

You could do something along these lines:

d={}
with open('/usr/share/dict/words') as f:
    for line in f:
        for word in line.split():
            word=word.strip()
            for c in word:
                d[c]=d.setdefault(c,0)+1

for k, v in sorted(d.items(), key=lambda t: t[1], reverse=True):
    print k,v

For the standard Unix words file, prints:

answered Sep 04 '13 at 01:23

dawg

98,345
23
131
206

`d.most_common()` returns a list of items sort from most common to least. So you can skip the sorting step. – Steven Rumbalski Sep 04 '13 at 01:58
@StevenRumbalski: That would only be if you use an instance of `Counter`. A Python base dict does not have a `.most_common()` method. – dawg Sep 04 '13 at 03:01
Whoops. I misread your answer to think you had a `Counter`. I think perhaps I read another answer and then accidentally commented on yours. – Steven Rumbalski Sep 05 '13 at 14:10

score 0 · Answer 5 · answered Sep 04 '13 at 01:54

Others have already given you better solutions using itertools.Counter, but your code was close; you just can't print the sorted output on the fly. You could save the counts in a list, sort it and then print:

def me():
    info = input("what file would you like to select?")
    filehandle = open(info,"r")
    data = filehandle.read()
    case = data.upper()
    s = ('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    result = []
    for i in range(26):
        result.append((s[i], case.count(s[i])))
    return result

result = me()
for letter, count in sorted(result, key=lambda x: x[1], reverse=True):
    print(letter, count)

Still using your logic, you can make the function more readable:

import string

def me():
    info = input("what file would you like to select?")
    filehandle = open(info,"r")
    data = filehandle.read()
    case = data.upper()
    result = []
    for letter in string.uppercase:
        result.append((letter, case.count(letter)))
    return result

Sorting and Organization of letter frequency - python

5 Answers5