Decipher( S ) will be given a string of English text shifted by some amount. Then, decipher should return, to the best of its ability, the original English string, which will be some rotation (possibly 0) of the input S. This means I have to try each possible decoding and estimate how English they are.
My approach is to use letter frequencies:
def letProb( c ):
""" if c is an alphabetic character,
we return its monogram probability (for english),
otherwise we return 1.0 We ignore capitalization.
Adapted from
http://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
"""
if c == 'e' or c == 'E': return 0.1202
if c == 't' or c == 'T': return 0.0910
if c == 'a' or c == 'A': return 0.0812
if c == 'o' or c == 'O': return 0.0768
if c == 'i' or c == 'I': return 0.0731
if c == 'n' or c == 'N': return 0.0695
if c == 's' or c == 'S': return 0.0628
if c == 'r' or c == 'R': return 0.0602
if c == 'h' or c == 'H': return 0.0592
if c == 'd' or c == 'D': return 0.0432
if c == 'l' or c == 'L': return 0.0398
if c == 'u' or c == 'U': return 0.0288
if c == 'c' or c == 'C': return 0.0271
if c == 'm' or c == 'M': return 0.0261
if c == 'f' or c == 'F': return 0.0230
if c == 'y' or c == 'Y': return 0.0211
if c == 'w' or c == 'W': return 0.0209
if c == 'g' or c == 'G': return 0.0203
if c == 'p' or c == 'P': return 0.0182
if c == 'b' or c == 'B': return 0.0149
if c == 'v' or c == 'V': return 0.0111
if c == 'k' or c == 'K': return 0.0069
if c == 'x' or c == 'X': return 0.0017
if c == 'q' or c == 'Q': return 0.0011
if c == 'j' or c == 'J': return 0.0010
if c == 'z' or c == 'Z': return 0.0007
return 1.0
Also I use this formula:
def list_to_str( L ):
""" L must be a list of characters; then,
this returns a single string from them
"""
if len(L) == 0: return ''
return L[0] + list_to_str( L[1:] )
And this:
def rot(c, n):
""" rot rotates c, a single character forward by n spots in the
alphabet
input: a single character
input: a non-negative integer n between 0 and 25
output: c forward by n spots in the alphabet
"""
if 'a' <= c <= 'z':
neword = ord(c) +n
if neword > ord('z'):
neword = neword - 26
elif 'A' <= c <= 'Z':
neword = ord(c) + n
if neword > ord('Z'):
neword = neword - 26
else:
neword = ord(c)
return chr(neword)
And finally this:
def decipher( S ):
"""
"""
L = [[rot(c, n) for c in S] for n in range(26)]
LoL = [[sum(letProb(c)) for letter in encoding] for encoding in L ]
L = max(LoL)
return list_to_str(L)
The first two formulas are good, but in the final formula there is something wrong, for sure in the sentence:
LoL = [[sum(letProb(c)) for letter in encoding] for encoding in L ]
TypeError: 'float' object is not iterable