Cipher text Letter Freq Substitution: Comparing 2 dictionaries' dict keys by value and altering a text

Question

I've had a look at similar topics, but no solution I can find exactly compares to what I'm trying to achieve.

I have a cipher text that needs to undergo a simple letter substitution based on the frequency of each letter's occurrence in the text. I already have a function to normalise the text (lowercase, no none-letter characters, no , count letter occurrences and then get the relative frequency of each letter. The letter is the key in a dictionary, and the frequency is the value.

I also have the expected letter frequencies for A-Z in a separate dictionary (k=letter, v=frequency), but i'm a bit befuddled by what to do next.

What I think I need to do is to take the normalised cipher text, the expected letter freq dict [d1] and the cipher letter freq dict [d2] and iterate over them as follows (part psuedocode):

for word in text:
    for item in word:
        for k,v in d2.items():
            if d2[v] == d1[v]:
                replace any instance of d2[k] with d1[k] in text
    decoded_text=open('decoded_text.txt', 'w')
    decoded_text.write(str('the decoded text')

Here, I want to take text and say "if the value in d2 matches a value in d1, replace any instance of d2[k] with d1[k] in text".

I realise i must have made a fair few basic python logic errors there (I'm relatively new at Python), but am I on the right track?

Thanks in advance

Update:

Thank you for all the helpful suggestions. I decided to try Karl Knechtel's method, with a few alterations to fit in my code. However, i'm still having problems (entirely in my implementation)

I have made a decode function to take the ciphertext file in question. This calls the count function previously made, which returns a dictionary (letter:frequency as a float). This meant that the "make uppercase version" code wouldn't work, as k and v didn't were floats and couldn't take .upper as an attribute. So, calling this decode function returns the ciphertext letter frequencies, and then the ciphertext itself, still encoded.

def sorted_histogram(a_dict):
    return [x[1] for x in sorted(a_dict.items(), key=itemgetter(1))]

def decode(filename):
    text=open(filename).read()
    cipher=text.lower()

    cipher_dict=count(filename)

    english_histogram = sorted_histogram(english_dict)
    cipher_histogram = sorted_histogram(cipher_dict)

    mapping = dict(zip(english_histogram, cipher_histogram)

    translated = ''.join(
    mapping.get(c, c)
    for c in cipher
    )
    return translated

Thomas K · Answer 1 · 2010-12-13T10:38:59.550

First off, note that it's very unlikely that the frequencies will give you an exact match, unless your message is very long. So you might need to do some manual tweaking to get the exact message. But if the frequencies are close enough...

You could get the keys of both dictionaries (letters), sorted by their values (frequencies):

letters_in_frequency_order = sorted(d1.keys(), key=lambda x: d1[x])

Then turn them into strings:

normal_alphabet = "".join(letters_in_frequency_order)

Then use them to translate the string:

import string
transtable = string.maketrans(cypher_alphabet, normal_alphabet)
cyphertext.translate(transtable)

Thanks, well spotted. I've updated the answer to use the sorted() function instead. — Thomas K, Dec 13 '10 at 10:39

score 0 · Accepted Answer · answered Dec 13 '10 at 01:41

You don't really want to do what you're thinking of doing, because the frequencies of characters in the sample won't, in general, match the exact frequency distribution in the reference data. What you're really trying to do is find the most common character and replace it with 'e', the next most and replace it with 't', and so on.

So what we're going to do is the following:

(I assume you can already do this part) Construct a dictionary of actual letter frequency in the ciphertext.
We define a function that takes a {letter: frequency} dictionary and produces a list of the letters in order of frequency.
We get the letters, in order of frequency, in our reference (i.e., now we have an ordered list of the most common letters in English), and in the sample (similarly).
On the assumption that the most common letter in the sample corresponds to the most common letter in English, and so on: we create a new dictionary that maps letters from the first list into letters from the second list. (We could also create a translation table for use with str.translate.) We'll make uppercase and lowercase versions of the same dictionary (I'll assume your original dictionaries have only lowercase) and merge them together.
We use this mapping to translate the cipher text, leaving other characters (spaces, punctuation, etc.) alone.

Thus:

# 2.
import operator
def sorted_histogram(a_dict):
  return [
    x[1] # the value
    for x in sorted(a_dict.items(), key=operator.itemgetter(1))
    # of each dict item, sorted by value (i.e. the [1] element of each item).
  ]

# 3.
english_histogram = sorted_histogram(english_dict)
cipher_histogram = sorted_histogram(cipher_dict)

# 4.
# Make the lowercase version
mapping = dict(zip(english_histogram, cipher_histogram))
# Make the uppercase version, and merge it in at the same time.
mapping.update(dict(
  (k.upper(), v.upper()) for (k, v) in zip(english_histogram, cipher_histogram)
))

# 5.
translated = ''.join( # make this list of characters, and string them together:
  mapping.get(c, c) # the mapped result, if possible; otherwise the original
  for c in cipher
)

# 6. Do whatever you want with 'translated' - write to file, etc.

miku · Answer 3 · 2010-12-13T02:05:17.510

#!/usr/bin/env python
from operator import itemgetter
import string

def frequency(text):
    d = {}
    for letter in text:
        try:
            d[letter] += 1
        except:
            d[letter] = 1
    return d

def alphabet():
    for alpha in string.letters: yield alpha

def cipher(text):
    expected = frequency(text)
    flist = sorted(expected.iteritems(), key=itemgetter(1), reverse=True)
    alphabet_generator = alphabet()
    for char, freq in flist:
        text = text.replace(char, alphabet_generator.next())
    return (text, expected)

def decipher(text, expected):
    nal = [ x[0] for x in sorted(expected.iteritems(), key=itemgetter(1), \
            reverse=True) ]
    normal_alphabet = ''.join(nal)
    transtable = string.maketrans(string.letters[:len(normal_alphabet)], \
                                  normal_alphabet)
    return text.translate(transtable)

Usage:

if __name__ == '__main__':
    s = "SUMMERTIMEANDTHELIVINGISEASYFISHESJUMPING"
    ciphered, expected = cipher(s)
    print s
    print ciphered
    print decipher(ciphered, expected)

# SUMMERTIMEANDTHELIVINGISEASYFISHESJUMPING
# ciddbpjadbfekjhbnaqaegacbfcrlachbcmidoaeg
# SUMMERTIMEANDTHELIVINGISEASYFISHESJUMPING

Cipher text Letter Freq Substitution: Comparing 2 dictionaries' dict keys by value and altering a text

3 Answers3