1

I am trying to fix a code that takes a ciphered input file and runs a frequency analysis of the letters and then decrypts the ciphered text. I got it to work for the most part, but the ciphered text is not fully decrypted. Can I get some suggestions on how I would fix it?

ETAOIN = 'ETAOINSHRDLCUMWFGYPBVKJXQZ'
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

cipher = open('cipher.txt', 'r').read()


def getLetterCount(message):
    alphabet = [chr(a + 65) for a in range(26)]
    letter_count = dict((x, 0) for x in alphabet)

    for letter in message.upper():
        if letter in LETTERS:
            letter_count[letter] += 1

    return letter_count


def getFreq(freqPair):
    return freqPair[0]


def getFrequencyOrder(message):
    letterToFreq = getLetterCount(message)

    freqToLetter = {}
    for letter in LETTERS:
        if letterToFreq[letter] not in freqToLetter:
            freqToLetter[letterToFreq[letter]] = [letter]
        else:
            freqToLetter[letterToFreq[letter]].append(letter)

    for freq in freqToLetter:
        freqToLetter[freq].sort(key=ETAOIN.find)
        freqToLetter[freq] = ''.join(freqToLetter[freq])

    print(freqToLetter)

    freq_pairs = list(freqToLetter.items())
    freq_pairs.sort(key=getFreq, reverse=True)

    freqOrder = []
    for freqPair in freq_pairs:
        freqOrder.append(freqPair[1])

    return ''.join(freqOrder)


mostFrequentLetters = getFrequencyOrder(cipher)

plaintext = ""
for letter in cipher:
    i = mostFrequentLetters.find(letter)
    plaintext += ETAOIN[i]

print(plaintext)

Here is the Ciphered text, save it in a text file called cipher.txt to run the code. I am just looking for suggestions on how I could improve this code.

GBTBVAGBFBYVGHQRNZNAARRQFGBERGVERNFZHPUSEBZUVFPUNZORENFSEBZFBPVRGLVNZABGFBYVGNELJUVYFGVERNQNAQJEVGRGUBHTUABOBQLVFJVGUZROHGVSNZNAJBHYQORNYBARYRGUVZYBBXNGGURFGNEFGURENLFGUNGPBZRSEBZGUBFRURNIRAYLJBEYQFJVYYFRCNENGRORGJRRAUVZNAQJUNGURGBHPURFBARZVTUGGUVAXGURNGZBFCURERJNFZNQRGENAFCNERAGJVGUGUVFQRFVTAGBTVIRZNAVAGURURNIRAYLOBQVRFGURCRECRGHNYCERFRAPRBSGURFHOYVZRFRRAVAGURFGERRGFBSPVGVRFUBJTERNGGURLNERVSGURFGNEFFUBHYQNCCRNEBARAVTUGVANGUBHFNAQLRNEFUBJJBHYQZRAORYVRIRNAQNQBERNAQCERFREIRSBEZNALTRARENGVBAFGURERZRZOENAPRBSGURPVGLBSTBQJUVPUUNQORRAFUBJAOHGRIRELAVTUGPBZRBHGGURFRRAIBLFBSORNHGLNAQYVTUGGURHAVIREFRJVGUGURVENQZBAVFUVATFZVYRGURFGNEFNJNXRANPREGNVAERIRERAPRORPNHFRGUBHTUNYJNLFCERFRAGGURLNERVANPPRFFVOYROHGNYYANGHENYBOWRPGFZNXRNXVAQERQVZCERFFVBAJURAGURZVAQVFBCRAGBGURVEVASYHRAPRANGHERARIREJRNEFNZRNANCCRNENAPRARVGUREQBRFGURJVFRFGZNARKGBEGUREFRPERGNAQYBFRUVFPHEVBFVGLOLSVAQVATBHGNYYURECRESRPGVBAANGHERARIREORPNZRNGBLGBNJVFRFCVEVGGURSYBJREFGURNAVZNYFGURZBHAGNVAFERSYRPGRQGURJVFQBZBSUVFORFGUBHENFZHPUNFGURLUNQQRYVTUGRQGURFVZCYVPVGLBSUVFPUVYQUBBQJURAJRFCRNXBSANGHERVAGUVFZNAAREJRUNIRNQVFGVAPGOHGZBFGCBRGVPNYFRAFRVAGURZVAQJRZRNAGURVAGRTEVGLBSVZCERFFVBAZNQROLZNAVSBYQANGHENYBOWRPGFVGVFGUVFJUVPUQVFGVATHVFURFGURFGVPXBSGVZOREBSGURJBBQPHGGRESEBZGURGERRBSGURCBRGGURPUNEZVATYNAQFPNCRJUVPUVFNJGUVFZBEAVATVFVAQHOVGNOYLZNQRHCBSFBZRGJRAGLBEGUVEGLSNEZFZVYYREBJAFGUVFSVRYQYBPXRGUNGNAQZNAAVATGURJBBQYNAQORLBAQOHGABARBSGURZBJAFGURYNAQFPNCRGURERVFNCEBCREGLVAGURUBEVMBAJUVPUABZNAUNFOHGURJUBFRRLRPNAVAGRTENGRNYYGURCNEGFGUNGVFGURCBRGGUVFVFGURORFGCNEGBSGURFRZRAFSNEZFLRGGBGUVFGURVEJNEENAGLQRRQFTVIRABGVGYRGBFCRNXGEHYLSRJNQHYGCREFBAFPNAFRRANGHER


Here is what the output looks like with the original sample: {329: 'AV', 299: 'B', 69: 'C', 3: 'DM', 252: 'E', 297: 'F', 432: 'G', 118: 'H', 36: 'I', 84: 'J', 2: 'K', 81: 'L', 354: 'N', 65: 'O', 107: 'P', 165: 'Q', 568: 'R', 99: 'S', 77: 'T', 281: 'U', 6: 'W', 20: 'X', 154: 'Y', 124: 'Z'} TNYNIOTNSNLITUDEACAOOEEDSTNRETIREASCUMHWRNCHISMHACBERASWRNCSNMIETGIACONTSNLITARGFHILSTIREADAODFRITETHNUYHONBNDGISFITHCEBUTIWACAOFNULDBEALNOELETHICLNNKATTHESTARSTHERAGSTHATMNCEWRNCTHNSEHEAVEOLGFNRLDSFILLSEPARATEBETFEEOHICAODFHATHETNUMHESNOECIYHTTHIOKTHEAT


It should be something like this. A long, uppercase, run-on sentence that fully translates the ciphered text. This is just an example of what it should look like, and not what it should actually translate to : {113: 'A', 104: 'B', 31: 'LC', 0: 'D', 90: 'E', 114: 'UF', 166: 'G', 39: 'H', 13: 'I', 40: 'J', 1: 'MK', 122: 'N', 29: 'O', 41: 'P', 52: 'Q', 224: 'R', 32: 'S', 22: 'T', 120: 'V', 2: 'W', 9: 'X', 50: 'Y', 55: 'Z'} CAPDOUBTFULLITSTOODASTWOSPENTSWIMMERSTHATDOECLINGTOGETHERANDCHOAKETHEIRARTTHEMERCILESSEMACDONWALDWORTHIETOBEAREBELLFORTOTHATTHEMULTIPLYINGVILLANIESOFNATUREDO...

FoxCode
  • 15
  • 4
  • 1
    Read [this article](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) for tips on debugging your code. – Code-Apprentice Jul 05 '22 at 01:04
  • You don't call any of your functions - `for letter in message.upper():` to `return letter_count` should belong to `getLetterCount()`. Also, please show us with a smaller sample data file what is output versus what you expected to be output. – Ken Y-N Jul 05 '22 at 01:54
  • In the original code, the return letter_count and for loop it is associated with is a part of the getLetterCount() function it just pasted weirdly. I hope the smaller sample size helps. – FoxCode Jul 05 '22 at 04:45
  • 1
    @FoxCode Please [edit] your question to fix the indentation. – Code-Apprentice Jul 05 '22 at 15:44
  • For what it's worth, I get the same bad result with this shorter (and likely faster) code: `print(cipher.translate(str.maketrans(''.join(sorted(ETAOIN, key=cipher.count, reverse=True)), ETAOIN)))` – Kelly Bundy Jul 05 '22 at 19:34
  • Where do the numbers `329: 'AV', 299: 'B', ...` come from? Your code actually does give me `113: 'A', 104: 'B', 31, ...`. – Kelly Bundy Jul 05 '22 at 19:56
  • The first result is using the original cipher text, I shortened it for the question to make it easier to read and paste. Sorry. – FoxCode Jul 05 '22 at 21:24

1 Answers1

2

The key is this is a Caesar cipher, so the OP's algorithm is wrong. A Caesar cipher only rotates the alphabet so a frequency analysis of more than the most common letter isn't needed.

The most common letter in the solution text is likely E. The most common letter in the cipher text is R, so if E=R then the Caesar cipher is as follows where the alphabet is rotated to align with R with E:

NOPQRSTUV... -> ABCDEFGHI...

Here's code to find the most common and translate the cipher. Since this is probably homework I'll leave it to the OP to write it without the import or using the built-in str.maketrans and str.translate:

import string

LETTERS = string.ascii_uppercase

with open('cipher.txt') as f:
    cipher = f.read()

most_common = max(LETTERS, key=cipher.count) # This letter is probably E

# find the rotation as the difference between ordinals of most common and E
# modulo 26 to give a number from 0-26.
rotation = (ord(most_common) - ord('E')) % 26

# Built the translation dictionary
caesar = LETTERS[rotation:] + LETTERS[:rotation]
translation = str.maketrans(caesar,LETTERS)

print(cipher.translate(translation))

Output:

TOGOINTOSOLITUDEAMANNEEDSTORETIREASMUCHFROMHISCHAMBERASFROMSOCIETYIAMNOTSOLITARYWHILSTIREADANDWRITETHOUGHNOBODYISWITHMEBUTIFAMANWOULDBEALONELETHIMLOOKATTHESTARSTHERAYSTHATCOMEFROMTHOSEHEAVENLYWORLDSWILLSEPARATEBETWEENHIMANDWHATHETOUCHESONEMIGHTTHINKTHEATMOSPHEREWASMADETRANSPARENTWITHTHISDESIGNTOGIVEMANINTHEHEAVENLYBODIESTHEPERPETUALPRESENCEOFTHESUBLIMESEENINTHESTREETSOFCITIESHOWGREATTHEYAREIFTHESTARSSHOULDAPPEARONENIGHTINATHOUSANDYEARSHOWWOULDMENBELIEVEANDADOREANDPRESERVEFORMANYGENERATIONSTHEREMEMBRANCEOFTHECITYOFGODWHICHHADBEENSHOWNBUTEVERYNIGHTCOMEOUTTHESEENVOYSOFBEAUTYANDLIGHTTHEUNIVERSEWITHTHEIRADMONISHINGSMILETHESTARSAWAKENACERTAINREVERENCEBECAUSETHOUGHALWAYSPRESENTTHEYAREINACCESSIBLEBUTALLNATURALOBJECTSMAKEAKINDREDIMPRESSIONWHENTHEMINDISOPENTOTHEIRINFLUENCENATURENEVERWEARSAMEANAPPEARANCENEITHERDOESTHEWISESTMANEXTORTHERSECRETANDLOSEHISCURIOSITYBYFINDINGOUTALLHERPERFECTIONNATURENEVERBECAMEATOYTOAWISESPIRITTHEFLOWERSTHEANIMALSTHEMOUNTAINSREFLECTEDTHEWISDOMOFHISBESTHOURASMUCHASTHEYHADDELIGHTEDTHESIMPLICITYOFHISCHILDHOODWHENWESPEAKOFNATUREINTHISMANNERWEHAVEADISTINCTBUTMOSTPOETICALSENSEINTHEMINDWEMEANTHEINTEGRITYOFIMPRESSIONMADEBYMANIFOLDNATURALOBJECTSITISTHISWHICHDISTINGUISHESTHESTICKOFTIMBEROFTHEWOODCUTTERFROMTHETREEOFTHEPOETTHECHARMINGLANDSCAPEWHICHISAWTHISMORNINGISINDUBITABLYMADEUPOFSOMETWENTYORTHIRTYFARMSMILLEROWNSTHISFIELDLOCKETHATANDMANNINGTHEWOODLANDBEYONDBUTNONEOFTHEMOWNSTHELANDSCAPETHEREISAPROPERTYINTHEHORIZONWHICHNOMANHASBUTHEWHOSEEYECANINTEGRATEALLTHEPARTSTHATISTHEPOETTHISISTHEBESTPARTOFTHESEMENSFARMSYETTOTHISTHEIRWARRANTYDEEDSGIVENOTITLETOSPEAKTRULYFEWADULTPERSONSCANSEENATURE

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thank you very much for the help. My homework is only to make recommendations on how to improve it and not actually fix this code. I will definitely be referencing your reply and this link to show that I got this recommendation from you, Mark Tolonen. I would like to know if you have any resources that would help me learn more about this topic and cipher/cryptography. – FoxCode Jul 05 '22 at 21:30
  • 1
    @FoxCode [Cracking Caesar Cipher](https://jrinconada.medium.com/cracking-caesar-cipher-8fe79226aabd) has some visualizations. The [Wikipedia](https://en.wikipedia.org/wiki/Caesar_cipher) article is good, too. – Mark Tolonen Jul 05 '22 at 22:07