0

I want to re-convert a binary equivalent file which contains "1"s and "0"s back to its JPG format (or convert it back to binary)

i.e i have a file which contains all 1's and 0's which i converted from a jpg image using the following function

    def convert_binary(inpath, outpath):
    byte2str = ["{:08b}".format(i) for i in range(256)]
    with open(inpath, "rb") as fin:
        with open(outpath, "w") as fout:
            data = fin.read(1024) 
            while data:
                for b in map(ord, data):
                    fout.write(byte2str[b])
                data = fin.read(1024)

    convert_binary("image.jpg", "binary_file.txt")

thanks to Tim Peters

I now want to convert this back (1's and 0's) back to its original image, any help would be grateful.

P.S: I am really sorry for such trivial questions, i am a biotechnology major and python programming is not my forte. I am experimenting with an app for my thesis and have got stuck.

Meet
  • 35
  • 1
  • 2
  • 3
  • I'm confused, what does the initial file represent? The actual bits of the JPG data, or a monochrome bitmap (or something else)? – Thomas Nov 23 '13 at 12:00
  • @Thomas I initially had a JPG image which i converted to 1's and 0's (converted the binary form of the JPEG to a form represented by 1's and 0's) Now this file which represents the binary of the image in 1's and 0's, i want to convert it into the original image i used. – Meet Nov 23 '13 at 12:10
  • For curiosity, what do you do with the data? – Hannes Ovrén Nov 23 '13 at 15:45

2 Answers2

1

Along the same vein as Steve's answer:

with open('input', 'rb', 1024) as fin, open('output', 'wb') as fout:
    fout.writelines(chr(int(chunk, 2)) for chunk in iter(lambda: fin.read(8), ''))
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • And the encoder could be re-written similarly, with `fout.writelines(byte2str[ord(b)] for b in iter(fin.read(1), ''))`. My concern is, is `read(8)` allowed to return 7 bytes in Python when there are 8 bytes left in the file, as `fread` is allowed to do in C? My answer assumes that it is allowed, the docs for `io.RawIOBase` just say, "Read up to n bytes from the object and return them." – Steve Jessop Nov 23 '13 at 12:27
  • @SteveJessop indeed... Your approach wins for clarity though - the 2-argument `iter` and the use of `writelines` isn't immediately clear. I just "golfed" your code a little for future reference... – Jon Clements Nov 23 '13 at 12:31
  • @SteveJessop Yup - read is allowed to return less than the requested size... However, if the filesize isn't divisible by 8 then I'd say that's a different problem... – Jon Clements Nov 23 '13 at 12:32
  • But might it (in theory) return less than 8 for a different reason than end of file? For example, an underlying async I/O could (again, in theory) partially complete but be interrupted by a signal or by an intermediate buffer size. The question then is whether Python blocks until all 8 bytes are available. If not then `read(8)` could return a partial read even if the file size is OK. In practice I very much doubt it would happen when `fin` refers to a regular file. If it was a socket then you'd need to know. – Steve Jessop Nov 23 '13 at 12:33
  • @SteveJessop for a normal file, if less would be returned for any reason other than there is actually less than the data requested, an exception would be raised... For pipes/constantly updated files/async streams it's a different kettle of fish – Jon Clements Nov 23 '13 at 12:38
  • OK, so whether this code is *completely* correct depends on whether the OS allows the filename `'input'` to refer to a pipe or a fishkettle instead of a normal file? Or does `open` ensure that the object returned always does complete reads to the size specified (or end-of-file), regardless of what's underneath? – Steve Jessop Nov 23 '13 at 12:39
  • @SteveJessop yes... because catering for that also require considering what should happen with infinite input streams, a reasonable timeout condition and such... I think it's safe to save given the OPs previous code they're dealing with "normal" files – Jon Clements Nov 23 '13 at 12:41
0

You can reverse x = byte2str[b] with int(x,2) and you can reverse ord with chr. Your .txt file contains 8 characters for each byte of the original jpg. So your code should look like:

data = fin.read(1024)
while data:
    for i in range(0, len(data), 8):
        fout.write(chr(int(data[i:i+8], 2)))
    data = fin.read(1024)

Unfortunately read isn't guaranteed to return exactly the number of bytes you ask for, it's allowed to return fewer. So we need to complicate things:

data = fin.read(1024)
while data:
    if len(data) % 8 != 0:
        # partial read
        endidx = len(data) - len(data) % 8
        leftover = data[endidx:]
        data = data[:endidx]
        if len(data) == 0:
            raise ValueError('invalid file, length is not a multiple of 8')
    for i in range(0, len(data), 8):
        fout.write(chr(int(data[i:i+8], 2)))
    data = leftover + fin.read(1024)

There are much better ways to represent a binary file as text though, for example base64 encoding.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • Before looping... you can do `import os`... `if os.path.getsize(fin.name) % 8 != 0` - complain about it... then leave the rest of your code as was... – Jon Clements Nov 23 '13 at 12:40
  • Well that's the same thing I'm asking about for your answer. I've written this code as if `read()` behaves like C `read`, and hence can return partial reads for no obvious reason. That's because I haven't (yet) found a guarantee in the Python documentation that it doesn't. `read` returns as soon as it has any data to return, and returns as much as it can at that point without blocking. So if the filename refers to something that does async I/O (maybe a network mapped drive, or a named pipe, whatever the OS supports) then that doesn't necessarily fall on 8-byte multiples. – Steve Jessop Nov 23 '13 at 12:47
  • Indeed... but in that case `getsize` won't work anyway - check line 1052 in http://hg.python.org/cpython/file/08f282c96fd1/Objects/fileobject.c - code speaks louder than words, and then you can draw your own conclusion... Looks like block until we get `n`, if we can't get that for a good reason, raise exception, otherwise keep trying until we have n and return to it... – Jon Clements Nov 23 '13 at 12:48
  • Note comments in lines 1103-1106 though – Jon Clements Nov 23 '13 at 12:54
  • @JonClements: unfortunately I learned C and C++ before Python, so I don't believe that code speaks louder than documentation. There's a difference between the behavior of CPython and the behavior guranteed for all correct Python implementations. But still it's useful to see what CPython does, because other implementations might just copy it regardless of what the docs say :-) – Steve Jessop Nov 23 '13 at 12:58
  • I imagine it's difficult to explicitly define the behaviour of `read` on *file-like* objects, but since `open` never became the universal wrapper for "open some resource uri" it wouldn't hurt to define specific behaviour for methods the actual fileobj's provides... – Jon Clements Nov 23 '13 at 13:02