2

I've been given a task to look for 0's and 1's in a real textbook in order to decipher an ASCII message from it. The problem is that it's really hard to find all 0's and 1's and I have the feeling I am skipping a lot of them. This completely messes up the ASCII conversion. Some of the things I tried:

  • 'synchronize' words by detecting spaces (or something close to a space)
  • trying to correct chars based on assumption of only alphabet characters (a-z, A-Z)
  • trying to correct words based on the assumption of frequency of chars in the language (Dutch)

But I still didn't get much out of it with the main problem being synchronization (when does a new char start?). I probably have to run through the books again (sigh, 3rd time or so) but I was wondering if you guys have any other ideas for the problem of missing bits in an ASCII binary stream?

  • 1
    this question does not make sense in its current form ... please provide (a small sample) example input and expected output (at the very least ...its probably a good idea to post some code that you have tried that did not give the correct results, as well as what way the results are wrong) – Joran Beasley Sep 28 '16 at 23:49
  • 2
    I'm not sure what you mean by "in a real textbook" --- are you scanning a printed book (on paper) and somehow losing bits from the scanner device (or from the scanned image-or-data file)? Are you scanning a book and looking for printed "0" and "1" characters, but the scanner is not recording all the characters (or misidentifying them as capital-O or lower-case-L)? Or are you starting with a complete file representing a book's contents (PDF? ASCII? LaTeX? XML? bitmap?), and somehow not getting all the data you expected out of it? – Kevin J. Chase Sep 29 '16 at 00:10

1 Answers1

0
all_ones_and_zeros = re.findall("[01]",corpus_of_text)
BITS_PER_ASCII = 8 #(ascii characters are all the ordinals from 0-255 ... or 8 bits)
asciis = zip([iter(all_ones_and_zeros)]*BITS_PER_ASCII)
bins = [''.join(x) for x in asciis]
chars = [chr(int(y,2)) for y in bins]

print "MSG:",chars

I guess ... its not very clear what your input or expected output is ...

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179