0

I have to find the signs "a..,z", "A,..,Z", "space", "." and "," in some data.

I have tried the code:

fh = codecs.open("mydata.txt", encoding = "utf-8")
text = fh.read()
fh1 = unicode(text)
dic_freq_signs = dict(Counter(fh1.split()))
All_freq_signs = dic_freq_signs.items()
List_signs = dic_freq_signs.keys()
List_freq_signs = dic_freq_signs.values()

BUT it gets me ALL signs not the ones i am looking for? Can anyone help?

(And it has to be unicode)

Babe
  • 9
  • 1
  • 1
  • 2

2 Answers2

0

check dictionary iteration ..

All_freq_signs = [ item for item in dic_freq_signs.items() if item.something == "somevalue"]
def criteria(value):
    return value%2 == 0
All_freq_signs = [ item for item in dic_freq_signs.items() if criteria(item)]
0

Make sure you import string module, with it you can get character ranges a to z and A to Z easily

import string

A Counter(any_string) gives the count of each character in the string. By using split() the counter would return the counts of each word in the string, contradicting with your requirement. So I have assumed that you need character counts.

dic_all_chars = dict(Counter(fh1))    # this gives counts of all characters in the string
signs = string.lowercase + string.uppercase + ' .,'    # these are the characters you want to check

# using dict comprehension and checking if the key is in the characters you want
dic_freq_signs = {key: value for key, value in dic_all_chars.items() 
                             if key in signs}

dic_freq_signs would only have the signs that you want to count as keys and their counts as values.

Prashanth
  • 1,252
  • 2
  • 13
  • 28