0

So for fun, I decided to revisit an old college assignment I had in which a ciphertext was given of about 75 characters, and a crib that the message was signed with three letters (initials of my teacher)

What I've done:

  1. Hemmed down the results to those that have part or all of the crib in them.
  2. Then I started doing some letter frequency analysis on the smaller subset of results from (1).

Now the task boils down to writing some language recognition software, but there are a few issues to deal with first. I chose to brute force all the rotor settings (type, initial pos) so the resulting entries with part or all of the crib in them still have some letters swapped from the plugboard.

I know my next move should be to make two matrices and digest a corpus where in the first matrix, I would just do a tally, so if the first letter was an A, in the first matrix, I would be at row 0, and the column I would increase would be the letter directly following the A, say it was a B. Then I would move over to the B and see that the next letter is a U so I would go to row B and increase column U's entry. After digesting a whole corpus, I would put probabilities into the second matrix.

Using the second matrix, I could assign score values to entire sentences and have a means of scoring the outputs and further hemming down the results so finding the message should be easy as finding a pin in a MUCH smaller haystack.

Now I'm doing this in python and I wanted to know if it is better to cast chars to ints, do a subtraction of the smallest char 'A' and then use that as my index, or if I should use a dict and every letter would correspond to an int value and so finding the indices for the location in my matrices would look something like LetterTally[dict['A']][dict['B']].

The cast subtraction method would look like this:

firstChar = 'A'
secondChar = 'B'

LetterTalley[(ord(firstChar)-ord('A'))][(ord(secondChar)-ord('A'))]

Of these two different methods, which is going to be faster?

tshepang
  • 12,111
  • 21
  • 91
  • 136
  • 5
    Why not measure it yourself? Python provides a convenient module, [timeit](http://docs.python.org/library/timeit.html), for this purpose. – Marcelo Cantos Mar 14 '12 at 12:20

1 Answers1

1

Instead of building a matrix, did you consider having a dict of dicts so that you can do the lookup (LetterTally['A']['B']) directly?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • That hadn't even cross my mind. I am going to look further into this. Still though, the question remains, is it better to cast and subtract or to do dict lookups? – user1268899 Mar 16 '12 at 23:21
  • Afterthought.. wouldn't an array lookup be faster than a dict lookup? – user1268899 Aug 10 '12 at 16:57