Calculating the Letter Frequency in Python

Question

I need to define a function that will slice a string according to a certain character, sum up those indices, divide by the number of times the character occurs in the string and then divide all that by the length of the text.

Here's what I have so far:

def ave_index(char):
  passage = "string"
  if char in passage:
    word = passage.split(char)
    words = len(word)
    number = passage.count(char)
    answer = word / number / len(passage)
    return(answer)

  elif char not in passage:
    return False

So far, the answers I've gotten when running this have been quite off the mark

EDIT: The passage we were given to use as a string - 'Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.'

when char = 's' the answer should be 0.5809489252885479

Give sample input and expected output – The6thSense Aug 31 '15 at 04:36 — The6thSense, Aug 31 '15 at 04:36

Nir Alfasi · Answer 1 · 2015-08-31T04:54:35.370

You can use Counter to check frequencies:

from collections import Counter
words = 'The passage we were given to use as a string - Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people\'s hats off - then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me.'

freqs = Counter(list(words)) # list(words) returns a list of all the characters in words, then Counter will calculate the frequencies 
print(float(freqs['s']) / len(words))

I use Python through another website (as required by my course) so importing isn't an option for me unfortunately — Saltharion, Aug 31 '15 at 04:55
`collections` is part of the standard Python library so is usually available for any installation of python — AChampion, Aug 31 '15 at 05:37

Burhan Khalid · Accepted Answer · 2015-08-31T04:53:06.113

The problem is how you are counting the letters. Take the string hello world and you are trying to count how many l there are. Now we know there are 3 l, but if you do a split:

>>> s.split('l')
['he', '', 'o wor', 'd']

This will result in a count of 4. Further, we have to get the position of each instance of the character in the string.

The enumerate built-in helps us out here:

>>> s = 'hello world'
>>> c = 'l'  # The letter we are looking for
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results
[2, 3, 9]

Now we have the total number of occurrences len(results), and the positions in the string where the letter occurs.

The final "trick" to this problem is to make sure you divide by a float, in order to get the proper result.

Working against your sample text (stored in s):

>>> c = 's'
>>> results = [k for k,v in enumerate(s) if v == c]
>>> results_sum = sum(results)
>>> (results_sum / len(results)) / float(len(s))
0.5804132973944295

Calculating the Letter Frequency in Python

2 Answers2