1

I'm a 2nd year B. Comp. Sci. student and have a cryptography assignment that's really giving me grief. We've been given a text file of transposition-encrypted English phrases and an English dictionary file, then asked to write a program that deciphers the phrases automatically without any user input.

My first idea was to simply brute-force all possible permutations of the ciphertext, which should be trivial. However, I then have to decide which one is the most-likely to be the actual plaintext, and this is what I'm struggling with.

There's heaps of information on word segmentation here on SO, including this and this amongst other posts. Using this information and what I've already learned at uni, here's what I have so far:

string DecryptTransposition(const string& cipher, const string& dict)
{
    vector<string> plain;

    int sz = cipher.size();
    int maxCols = ceil(sz / 2.0f);
    int maxVotes = 0, key = 0;

    // Iterate through all possible no.'s of cols.
    for (int c = 2; c <= maxCols; c++)
    {
        int r = sz / c;     // No. of complete rows if c is no. of cols.
        int e = sz % c;     // No. of extra letters if c is no. of cols.

        string cipherCpy(cipher);
        vector<string> table;
        table.assign(r, string(c, ' '));

        if (e > 0) table.push_back(string(e, ' '));
        for (int y = 0; y < c; y++)
        {
            for (int x = 0; x <= r; x++)
            {
                if (x == r && e-- < 1) break;
                table[x][y] = cipherCpy[0];
                cipherCpy.erase(0, 1);
            }
        }
        plain.push_back(accumulate(table.begin(),
            table.end(), string("")));

        // plain.back() now points to the plaintext
        // generated from cipher with key = c
        int votes = 0;
        for (int i = 0, j = 2; (i + j) <= sz; )
        {
            string word = plain.back().substr(i, j);
            if (dict.find('\n' + word + '\n') == string::npos) j++;
            else
            {
                votes++;
                i += j;
                j = 2;
            }
        }
        if (votes > maxVotes)
        {
            maxVotes = votes;
            key = c;
        }
    }
    return plain[key - 2];      // Minus 2 since we started from 2
}

There are two main problems with this algorithm:

  1. It is incredibly slow, taking ~30 sec. to decrypt a 80-char. message.
  2. It isn't completely accurate (I'd elaborate on this if I hadn't already taken up a whole page, but you can try it for yourself with the full VC++ 2012 project).

Any suggestions on how I could improve this algorithm would be greatly appreciated. MTIA :-)

Community
  • 1
  • 1
Kenny83
  • 769
  • 12
  • 38
  • I find it interesting that answer.txt has a bunch of typos, which could be one of the reasons why it isn't completely accurate. Examples: EXCPET on the 3rd to last line, GODDREASO on the second to last line, DIFFFICULT on the last line. If you are trying to match your phrases to the dictionary file, it'll fail if you are matching those words. – hargobind Nov 06 '13 at 10:50
  • @hargobind Yeah the professor told us there could be some spelling mistakes in some of the test messages, and our algorithm is supposed to produce the corresponding answer whether this is the case or not. So the corresponding answers have those mistakes in them. If you don't mind telling me, what did you think of the code? – Kenny83 Nov 06 '13 at 12:03
  • I'm not a C++ programmer, so I can't give you a critique :) – hargobind Nov 06 '13 at 12:10

0 Answers0