Transform ordinary string (which is supposed to be a binary string) back into a binary string

Question

So I am currently working with the library: simple-crypt.

I have managed to transform a certain input string into it´s binary string.

        pw_data = input("Please type in your p!")  # enter password
        pw_data_confirmed = input("Please confirm!")
        _platform = input("Please tell me the platform!")  # belonging platform
        if pw_data == pw_data_confirmed:  # check confirmed pw
            print("Received!")

            salt_data = "AbCdEfkhl"  # salt key

            ciphertext = encrypt(salt_data, pw_data.encode("utf8"))  # encrypt pw with salt key

Binary string e.g: b'sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0'

This binary string will then be stored in a word document.

The problem now is: As soon as I read the document and get this specific binary string, it will not recognize it as binary string anymore. Instead, it is now of data type string.

p_loc = input("Which platform do you need?")
doc_existing = docx.Document(r"xxx")
text = []
for i in doc_existing.paragraphs:
    text.append(i.text)

for pos,i in enumerate(text):
    if i == p_loc:
    len_pos = len(text[pos+1])
    p_code = text[pos+1][2:len_pos-1]  # get the binary string which is of type ordinary string
print(p_code.encode("utf8"))  # when I apply .encode , another \ is added so I have for my binary code two \\


salt_data = "AbCdEfkhl"

plain = decrypt(salt_data, p_code)

print(plain)

p_code without .encode statement (as a string, not bytestring!): sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0

When I now print out p_code.encode("utf8") I get the following result: b'sc\\x00\\x02X\\xd8\\x8ez\\xbfB\\x03s\\xc5\\x8bm\\xecp\\x19\\x8d\\xd6lqW\\xf1\\xc3\\xa4y\\x8f\\x1aW)\\x9bX\\xfc\\x0e\\xa4\\xf2ngJj/]{\\x80\\x06-\\x07\\x8cQ\\xeef\\x0b\\x02?\\x86\\x19\\x98\\x94eW\\x08}\\x1d8\\xdb\\xe57\\xf7\\x97\\x81\\xb6\\xc7\\x08\\n^\\xc9\\xc0'

So the problem is, if you compare this second binary string with the original binary string, that it added a second \ to it. As a consequence, I am not able to decode this binary string as it won t recognize it as the original binary code string.

So my question is: Is there a casual way to simply transform a string which is already in binary style back into binary string so it is the same? Or is there a way I could remove the second \ so that I have the original binary string again?

I am very grateful for any help!!

"This binary string will then be stored in a word document." It's not clear what you mean by this. The example "binary string" you give (`b'sc\x00\x02...`) is not a string. It is a bytes object. The term "binary string" is misleading. How are you storing these bytes into the document? Are you using python 2 or 3? — Tom Dalton, Feb 05 '21 at 13:21
I m sorry if I explained that unclearly. First of all I use python3.8. And what I do is: I use a certain input string e.g. "Hellothisisme" and encrpyt it. I use the simple-crypt for it and before I apply simple-crypt I transform the string into binary string with encode(utf8). This binary string will then be appended into a word document with docx. The problem with that: Once I stored it in the word, it is now of type string. So when I now try to access this string, it is a binarystring which is converted into a string so it is not recognized as a bytestring. — manumanu12, Feb 05 '21 at 13:27
I'd avoid using the term "binary string", if you have a string, and you encode it with utf-8, the result is bytes. How are you storing those bytes into the word document/file? — Tom Dalton, Feb 05 '21 at 13:32
mydoc = docx.Document() mydoc.add_heading("Stored data") mydoc.add_paragraph("") mydoc.add_paragraph("") mydoc.add_paragraph(f"{_platform}") mydoc.add_paragraph(f"{ciphertext}") mydoc.save(r"xxx") This is how I stored it. ciphertext = encrypt(salt_data, pw_data.encode("utf8")) . The output of ciphertext is the binary string which I post in the document then as a string. — manumanu12, Feb 05 '21 at 13:34

score 1 · Accepted Answer · answered Feb 05 '21 at 13:48

Ok. So when you do f"{ciphertext}" you are telling python to store the string representation of those bytes, as text, in the doc.

E.g.

>>> b = b"\x00\x01\x65\x66"
>>> print(f"{b}")
b'\x00\x01ef'

You (probably) don't really want to store b'\x00\x01ef' in your word doc. A good general way to store binary data in text form is to use a different encoding. Base64 is a commonly used encoding that is intended to store binary data in a text-based form.

See https://docs.python.org/3/library/base64.html for more information.

In your case, you do something like

import base64

cipher_b64_b = base64.b64encode(ciphertext)
cipher_b64 = cipher_b64_b.decode() # cipher_b64 is now a string.
# Now store this cipher_b64 string in your word document

...

# Now you fetch p_code (which is now a base64 string) from your word doc
cipher_b64_b = p_code.encode()
cipher = base64.b64decode(cipher_b64_b)

This results in your original binary ciphertext. The word document will contain a base64 encoded string like "AAFlZg==", which avoids the issues with escape sequences etc in your word document.

Now it works perfectly. Thank you so much for your time! This really helped me out and I learned a lot! — manumanu12, Feb 05 '21 at 14:06

Transform ordinary string (which is supposed to be a binary string) back into a binary string

1 Answers1