The line that you point at as a success likely isn't doing what you think it is doing either:
a = r"\x00hello"
That defines a string of 9 characters, \
, x
, etc. Calling a.encode('utf-8', errors='ignore')
takes that string and encodes the characters in the string using utf-8 and returns a bytes
value of that encoding. (which CtypeObj.Function()
accepts)
I would assume that you don't really want that \00
part passed to the function?
Reading from the 'rb'
mode file gets you a bytes
value as well, but the encoding of the file will be the encoding of that bytes
value. If you need it to be utf-8 encoding (and the file might not be), then you should instead open the file as 'r'
, read the value as a string, and encode with b.encode('utf-8')
.
And finally this line:
c = b"\x00hello"
This just creates a length 6 bytes
value, with the first byte being the 0
byte, and the rest the values for the 5 letters. However, that's not automatically a utf-8 encoding, and certainly not the same as you had before. Again, it would seem you don't want that \x00
at the start, since it's very unusual for a string to start with a null character like that.
As indicated in the comments, r"\x00hello"
and 'hello'
are all string literals, but that's only meaningful in the context of code. In terms of data, you only have strings of characters (str
) and bytes
values (sometimes called a string of bytes). A "literal" is a way to write either in code directly:
s = 'hello' # a string literal
b = b'hello' # a bytes literal for the same text (under most encodings)
s == b.decode() # True
b == s.encode() # True
If you read a file using mode 'r'
, you get strings. If you use a file using mode 'rb'
, you get bytes.