Reading Nul character in binary file

Question

I'm having trouble reading and writing the nul values in a binary file to a string.
The file I have is a MS Word file. I have converted it into a binary file and had a look at it in vscode, but in some parts of the file there are red highlights of smaller combined characters. Here is what the binary file looks like in VSCode:

And this is what the file looks like in NotePad:

This is what the first 5 characters of the file look like in binary when converted in python:
1010000 1001011 11 100 10100
The problem with this is Python is ignoring the parts highlighted in red. I understand that those parts like nul are very important, but why is Python skipping them, or at the very least changing their values, and is there a way to allow python to copy the exact contents of the file, including those red parts?
I have tested other files before ranging from .jpg to .ppt, but none of them have ever done this before.

Thanks in advance and have a great day!

Can you share the code that you are using to read the code in Python? Are you using the `open('myfile.bin','rb')` to read the file as a binary file? — scotty3785, Nov 12 '21 at 09:19
Also, if this is a word file, have you tried unzipping it first to get the xml files contained within? — scotty3785, Nov 12 '21 at 09:27
@scotty3785 I'm using `with open("wordFile.bin", 'rb') as infile:` I've almost found the problem with my code actually, it's not in the way it's extracting the binary digits but one of the conditions, so I'll have a look at it and see if it's still the file. By the way, how do you unzip it to get the xml files? Thanks. — Jumanji176, Nov 12 '21 at 10:14
@Jumanji176 If it's a docx, not a doc, then it is just a glorified zip file. Just use Python's zipfile module to read it. Heck, try changing the file extension to .zip and look into it with your file explorer. — Homer512, Nov 12 '21 at 12:20

Reading Nul character in binary file

0 Answers0