Ascii stream with missing bits (no parity)

Question

I've been given a task to look for 0's and 1's in a real textbook in order to decipher an ASCII message from it. The problem is that it's really hard to find all 0's and 1's and I have the feeling I am skipping a lot of them. This completely messes up the ASCII conversion. Some of the things I tried:

'synchronize' words by detecting spaces (or something close to a space)
trying to correct chars based on assumption of only alphabet characters (a-z, A-Z)
trying to correct words based on the assumption of frequency of chars in the language (Dutch)

But I still didn't get much out of it with the main problem being synchronization (when does a new char start?). I probably have to run through the books again (sigh, 3rd time or so) but I was wondering if you guys have any other ideas for the problem of missing bits in an ASCII binary stream?

this question does not make sense in its current form ... please provide (a small sample) example input and expected output (at the very least ...its probably a good idea to post some code that you have tried that did not give the correct results, as well as what way the results are wrong) — Joran Beasley, Sep 28 '16 at 23:49
I'm not sure what you mean by "in a real textbook" --- are you scanning a printed book (on paper) and somehow losing bits from the scanner device (or from the scanned image-or-data file)? Are you scanning a book and looking for printed "0" and "1" characters, but the scanner is not recording all the characters (or misidentifying them as capital-O or lower-case-L)? Or are you starting with a complete file representing a book's contents (PDF? ASCII? LaTeX? XML? bitmap?), and somehow not getting all the data you expected out of it? — Kevin J. Chase, Sep 29 '16 at 00:10

score 0 · Answer 1 · answered Sep 28 '16 at 23:52

all_ones_and_zeros = re.findall("[01]",corpus_of_text)
BITS_PER_ASCII = 8 #(ascii characters are all the ordinals from 0-255 ... or 8 bits)
asciis = zip([iter(all_ones_and_zeros)]*BITS_PER_ASCII)
bins = [''.join(x) for x in asciis]
chars = [chr(int(y,2)) for y in bins]

print "MSG:",chars

I guess ... its not very clear what your input or expected output is ...

Ascii stream with missing bits (no parity)

1 Answers1