3

I built a Python steganographer that hides UTF-8 text in images and it works fine for it. I was wondering if I could encode complete files in images. For this, the program needs to read all kinds of files. The problem is that not all files are encoded with UTF-8 and therefore, you have to read them with:

file = open('somefile.docx', encoding='utf-8', errors='surrogateescape')

and if you copy it to a new file and read them then it says that the files are not decipherable. I need a way to read all kinds of files and later write them so that they still work. Do you have a way to do this in Python 3?

Thanks.

TheRandomGuy
  • 337
  • 5
  • 20
  • 3
    Read files as binary not as text. – Daniel Jun 10 '17 at 08:11
  • Python3 has a codecs package for stuff like this. Check it out. – cs95 Jun 10 '17 at 08:14
  • @Daniel Thanks for the idea. But how do I hide the `bytes` object in the image. It is not exactly a string and converting it to string won't work either and the `encode` function doesn't work on it. How do I proceed? – TheRandomGuy Jun 10 '17 at 08:33
  • For some reason the code you link to expects a string `message`. That doesn't really make any sense (there's a reason why it calls `str2bin(message)` immediately). Take it as an exercise3 and rewrite it so it expects a bytes `message` right from the start. (Hint: Mainly this involves throwing out unnecessary code.) – Tomalak Jun 10 '17 at 08:36
  • @Tomalak I did this. I edited the str2bin function. `def str2bin(message): if type(message) is str: message = message.encode('utf-8') binary = bin(int.from_bytes(message, 'big')) return binary[2:]` to solve that problem. – TheRandomGuy Jun 10 '17 at 08:43
  • Nice, that's one way to do it! – Tomalak Jun 10 '17 at 08:49
  • It's a philosophical point, but I would change the program so that it *always* expects a `bytes` message. So the responsibility what kind of bytes (UTF-8, ASCII, whatever) are passed in lies in the hands of the user. You know, [*"explicit is better than implicit"*](https://www.python.org/dev/peps/pep-0020/). In the end it's the same ambiguity that made you stumble - you don't encode "text or bytes" into the image - in reality you encode bytes and bytes only. That, or I'd make a `hide_text()` method. – Tomalak Jun 10 '17 at 08:51

1 Answers1

4

Change your view. You don't "hide UTF-8 text in images". You hide bytes in images.

These bytes could be - purely accidentally - interpretable as UTF-8-encoded text. But in reality they could be anything.

Reading a file as text with open("...", encoding="...") has the hidden step of decoding the bytes of the file into string. This is convenient when you want to treat the file contents as string in your program.

Skip that hidden decoding step and read the file as bytes: open("...", "rb").

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • I have a question. How do I convert the `bytes` object to an integer to hide? Using the `int.from_bytes` function? – TheRandomGuy Jun 10 '17 at 08:36
  • 1
    Exactly. `int.from_bytes()` expects bytes as input. `"someString".encode("UTF-8")` turns a string into bytes. That's one more step than you really need. – Tomalak Jun 10 '17 at 08:44
  • I'll make it a bytes-only function. Thanks very much. – TheRandomGuy Jun 10 '17 at 08:55
  • The code works fine for simple files like .txt and .py but when I try to use it for .pdf and .docx it shows that the file is corrupt. Can you tell why? I open the file as `file = open(self.docname, mode='rb') funcs.hide(self.filepath, file.read(), bits)`. The file size is also reduced from 40 KB to 3 KB, is it because a NULL character is present in the file? – TheRandomGuy Jun 11 '17 at 13:53
  • No, but that is good material for a separate question. Prepare the shortest-possible self-contained code that reproduces the issue and ask a new question here. (If I had to guess it has something to do with the padding the function you are using does.) – Tomalak Jun 11 '17 at 13:59
  • I've added a new question https://stackoverflow.com/questions/44484791/python-steganographer-file-handling-error-for-non-plain-text-files. – TheRandomGuy Jun 11 '17 at 14:13