How can I use English language letter frequencies in my Caesar Cipher deciphering assignment?

Question

I have an assignment to create a program that can decipher a Caesar Cipher that the user enters. The teacher provided us with a helper function:

double letterScore(char ch){
    if (ch == 'A' || ch == 'a') return .0684;
    if (ch == 'B' || ch == 'b') return .0139;
    if (ch == 'C' || ch == 'c') return .0146;
    if (ch == 'D' || ch == 'd') return .0456;
    if (ch == 'E' || ch == 'e') return .1267;
    if (ch == 'F' || ch == 'f') return .0234;
    if (ch == 'G' || ch == 'g') return .0180;
    if (ch == 'H' || ch == 'h') return .0701;
    if (ch == 'I' || ch == 'i') return .0640;
    if (ch == 'J' || ch == 'j') return .0033;
    if (ch == 'K' || ch == 'k') return .0093;
    if (ch == 'L' || ch == 'l') return .0450;
    if (ch == 'M' || ch == 'm') return .0305;
    if (ch == 'N' || ch == 'n') return .0631;
    if (ch == 'O' || ch == 'o') return .0852;
    if (ch == 'P' || ch == 'p') return .0136;
    if (ch == 'Q' || ch == 'q') return .0004;
    if (ch == 'R' || ch == 'r') return .0534;
    if (ch == 'S' || ch == 's') return .0659;
    if (ch == 'T' || ch == 't') return .0850;
    if (ch == 'U' || ch == 'u') return .0325;
    if (ch == 'V' || ch == 'v') return .0084;
    if (ch == 'W' || ch == 'w') return .0271;
    if (ch == 'X' || ch == 'x') return .0007;
    if (ch == 'Y' || ch == 'y') return .0315;
    if (ch == 'Z' || ch == 'z') return .0004;
    return 0.0;
}

Apparently, these numbers are meant to be used for letter frequencies in the English language to determine what phrases are best to output as the deciphered phrase. For example, you'd expect that 12.67% of letters in a book are the letter "e", and so the return value for if the inputted letter is "e" is 0.1267.

That was the helper function. We are supposed to implement it inside another function, decipher, which will have a string parameter str. Decipher will be implemented 25 times within main and will decipher the string that the user inputs. The only issue is, I don't understand how I can use the helper function letterScore within decipher to discern how to decipher the Caesar Cipher given.

You could use a `std::map` if that hekps. – πάντα ῥεῖ Dec 13 '20 at 20:39 — πάντα ῥεῖ, Dec 13 '20 at 20:39

πάντα ῥεῖ · Answer 1 · 2020-12-13T20:50:37.753

1

I don't understand how I can use the helper function letterScore within decipher to discern how to decipher the Caesar Cipher given.

Well, you can first read the ciphered text, and build a profile of the letters' occurence, store that in a std::map<char,double>, the double value represents the ratio number of char occurence / total text size as percentage value.

The next step is, to determine which values in that map you build before, will be nearest to the standard percentages. If you find matches you can replace the characters, and present that as possible deciphered result.

edited Dec 13 '20 at 20:50

answered Dec 13 '20 at 20:45

πάντα ῥεῖ

1
13
116
190

I don't really understand what you mean by map, sorry. We are expected to complete the project without things like arrays, vectors, maps. I'm kinda worried I'll lose credit for that. Can you please elaborate on what map means though? It sounds interesting. – Rahiz Khan Dec 13 '20 at 21:12
πάντα ῥεi, I don't seem to see any mention of map in the link you gave. – Rahiz Khan Dec 14 '20 at 00:58
Thank you! What does it mean to build a profile of the letters' occurrence though? I'm not sure how I'd do that. – Rahiz Khan Dec 14 '20 at 00:59
1

@RahizKhan Ooops, sorry. That was wrongly loaded into my clipboard buffer. I meant to link you here: https://en.cppreference.com/w/cpp/container/map – πάντα ῥεῖ Dec 14 '20 at 01:00
@RahizKhan _"What does it mean to build a profile of the letters' occurrence though?"_ It's a kind of _"dictionary"_, where the `char` is the lookup key, and the associated `double` (dictionary entry) is the calculated ratio of occurrence. – πάντα ῥεῖ Dec 14 '20 at 01:02

score 1 · Answer 2 · answered Dec 14 '20 at 14:24

To decipher a Caesar cipher you try all 25 possible shifts and try to pick out the right shift. This is called "Running down the alphabet" and it looks like this:

NBCM CM UH YRUGJFY
ocdn dn vi zsvhkgz
pdeo eo wj atwilha
qefp fp xk buxjmib
rfgq gq yl cvyknjc
sghr hr zm dwzlokd
this is an example
uijt jt bo fybnqmf
vjku ku cp gzcorng
wklv lv dq hadpsoh
xlmw mw er ibeqtpi
ymnx nx fs jcfruqj
znoy oy gt kdgsvrk
aopz pz hu lehtwsl
bpqa qa iv mfiuxtm
cqrb rb jw ngjvyun
drsc sc kx ohkwzvo
estd td ly pilxawp
ftue ue mz qjmybxq
guvf vf na rknzcyr
hvwg wg ob sloadzs
iwxh xh pc tmpbeat
jxyi yi qd unqcfbu
kyzj zj re vordgcv
lzak ak sf wpsehdw
mabl bl tg xqtfiex

What you do is generate each of the 25 shifts in turn, testing each one by adding the score for all the letters: "vjku" will score lower than "this" for example. Keep a note of the shift with the best score so far. When you have tried all 25 possible shifts, the one with the highest score should give the right answer.

score 0 · Answer 3 · answered Oct 15 '21 at 12:45

Cool. A Caesar Cipher code breaker.

Using brute force + heuristicts.

OK, how to do?

First we create an ultra simple function for encoding/decoding. This has been shown so often, I will simply use i´t. If somebody wants an explanation for a years aold question, then please comment, and I will add.

Next, wesimply try out all possible 25 ciphers and decrypt the message in a loop for all ciphers. Then, for all of the decrypted strings, we iterate over each letter and calculate a score, depending on the frequency of occurence of such letter in the English language. We store the score and the cipher in a std::vector.

After the loop is finished, we will sort the std::vector. At the end there will be the highest score which denotes most probably the cypher.

At the end we will simply show the decrypted message using the calculated cypher. That is most probably the correct solution.

All this can be done with 15 lines of simple code . . .

Please see:

#include <iostream>
#include <array>
#include <cctype>
#include <string>
#include <algorithm>
#include <vector>

// Some test string
const std::string test{ R"(Espcp lcp xlyj mtr lyo dxlww wtmclctpd pgpcjhspcp ty zfc nzfyecj. Espj slgp xtwwtzyd zq mzzvd ty otqqpcpye wlyrflrpd. Jzf nly qtyo espcp esp 
zwopde lyo esp yphpde mzzvd.  Pgpcj dnszzw sld l wtmclcj. Afatwd nzxp ez esp wtmclcj ez elvp mzzvd zy otqqpcpye dfmupned.  Esp dnszzw wtmclcj hspcp Zwpr defotpd td rzzo. 
Te td l wlcrp nwply czzx. Espcp lcp qzfc mtr htyozhd ty te. Esp hlwwd lcp wtrse mwfp. Espcp lcp l wze zq dspwgpd qfww zq mzzvd. Jzf nly qtyo mzzvd zy wtepclefcp, asjdtnd, 
stdezcj, nspxtdecj, rpzrclasj, mtzwzrj lyo zespc dfmupned. Espcp lcp mzzvd ty Pyrwtds, ezz.  Zy esp hlwwd jzf nly dpp atnefcpd zq dzxp rcple hctepcd lyo azped.  Zy esp elmwp 
yplc esp htyozh jzf nly lwhljd dpp mplfetqfw dactyr lyo lfefxy qwzhpcd.  Zwpr wtvpd ez rz ez esp wtmclcj. Sp nly lwhljd qtyo espcp dzxpestyr yph, dzxpestyr sp yppod)" };

// Letter frequencies in the English Language
constexpr std::array<double, 26> LetterWeight{ .0684,.0139,.0146,.0456,.1267,.0234,.0180,.0701,.0640,.0033,.0093,.0450,.0305,.0631,.0852,.0136,.0004,.0534,.0659,.0850,.0325,.0084,.0271,.0007,.0315,.0004 };

// Simple function for Caesar encyption/decyption
std::string caesar(const std::string& in, int key) {
    std::string res(in.size(), ' ');
    std::transform(in.begin(), in.end(), res.begin(), [&](char c) {return std::isalpha(c) ? (char)((((c & 31) - 1 + ((26 + (key % 26)) % 26)) % 26 + 65) | c & 32) : c; });
    return res;
}
// Test code 
int main() {
    // We will try all possible ciphers 1..25
    std::vector<std::pair<double, int>> score(26);
    for (int key = 1; key < 26; ++key) {

        // Get one possible deciphered test
        for (const char c : caesar(test, key)) {

            // Calculate score according toLetter weight
            score[key].first += (std::isalpha(c)) ? LetterWeight[(c & 31) - 1] : 0.0; 
            score[key].second = key; // And store the cyper
        }
    }
    // Now sort for getting the index with the highes score
    std::sort(score.begin(), score.end());

    // And show the most probable result to the user.
    std::cout << "\n\n" << caesar(test, score.back().second) << "\n\n";
};

How can I use English language letter frequencies in my Caesar Cipher deciphering assignment?

3 Answers3