I'm a 2nd year B. Comp. Sci. student and have a cryptography assignment that's really giving me grief. We've been given a text file of transposition-encrypted English phrases and an English dictionary file, then asked to write a program that deciphers the phrases automatically without any user input.
My first idea was to simply brute-force all possible permutations of the ciphertext, which should be trivial. However, I then have to decide which one is the most-likely to be the actual plaintext, and this is what I'm struggling with.
There's heaps of information on word segmentation here on SO, including this and this amongst other posts. Using this information and what I've already learned at uni, here's what I have so far:
string DecryptTransposition(const string& cipher, const string& dict)
{
vector<string> plain;
int sz = cipher.size();
int maxCols = ceil(sz / 2.0f);
int maxVotes = 0, key = 0;
// Iterate through all possible no.'s of cols.
for (int c = 2; c <= maxCols; c++)
{
int r = sz / c; // No. of complete rows if c is no. of cols.
int e = sz % c; // No. of extra letters if c is no. of cols.
string cipherCpy(cipher);
vector<string> table;
table.assign(r, string(c, ' '));
if (e > 0) table.push_back(string(e, ' '));
for (int y = 0; y < c; y++)
{
for (int x = 0; x <= r; x++)
{
if (x == r && e-- < 1) break;
table[x][y] = cipherCpy[0];
cipherCpy.erase(0, 1);
}
}
plain.push_back(accumulate(table.begin(),
table.end(), string("")));
// plain.back() now points to the plaintext
// generated from cipher with key = c
int votes = 0;
for (int i = 0, j = 2; (i + j) <= sz; )
{
string word = plain.back().substr(i, j);
if (dict.find('\n' + word + '\n') == string::npos) j++;
else
{
votes++;
i += j;
j = 2;
}
}
if (votes > maxVotes)
{
maxVotes = votes;
key = c;
}
}
return plain[key - 2]; // Minus 2 since we started from 2
}
There are two main problems with this algorithm:
- It is incredibly slow, taking ~30 sec. to decrypt a 80-char. message.
- It isn't completely accurate (I'd elaborate on this if I hadn't already taken up a whole page, but you can try it for yourself with the full VC++ 2012 project).
Any suggestions on how I could improve this algorithm would be greatly appreciated. MTIA :-)