0

I'm having trouble reading and writing the nul values in a binary file to a string.
The file I have is a MS Word file. I have converted it into a binary file and had a look at it in vscode, but in some parts of the file there are red highlights of smaller combined characters. Here is what the binary file looks like in VSCode: enter image description here

And this is what the file looks like in NotePad:
enter image description here

This is what the first 5 characters of the file look like in binary when converted in python:
1010000 1001011 11 100 10100
The problem with this is Python is ignoring the parts highlighted in red. I understand that those parts like nul are very important, but why is Python skipping them, or at the very least changing their values, and is there a way to allow python to copy the exact contents of the file, including those red parts?
I have tested other files before ranging from .jpg to .ppt, but none of them have ever done this before.

Thanks in advance and have a great day!

Jumanji176
  • 47
  • 9
  • 1
    Can you share the code that you are using to read the code in Python? Are you using the `open('myfile.bin','rb')` to read the file as a binary file? – scotty3785 Nov 12 '21 at 09:19
  • 2
    Also, if this is a word file, have you tried unzipping it first to get the xml files contained within? – scotty3785 Nov 12 '21 at 09:27
  • @scotty3785 I'm using `with open("wordFile.bin", 'rb') as infile:` I've almost found the problem with my code actually, it's not in the way it's extracting the binary digits but one of the conditions, so I'll have a look at it and see if it's still the file. By the way, how do you unzip it to get the xml files? Thanks. – Jumanji176 Nov 12 '21 at 10:14
  • @Jumanji176 If it's a docx, not a doc, then it is just a glorified zip file. Just use Python's zipfile module to read it. Heck, try changing the file extension to .zip and look into it with your file explorer. – Homer512 Nov 12 '21 at 12:20

0 Answers0