Because of the notation, I guess you're using Python.
In Python, the b'...'
notation is used for bytes objects.
When str
or bytes
objects are represented in the source code or on the terminal, all characters that represent a printable ASCII character (roughly all values from 32 to 127), are shown as that character. All other characters are escaped using the \xx
notation, where xx
is the hexadecimal number.
This is why you see a strange mix of printable characters and escape codes.
Note that you can escape printable characters as well: b'\x41'
is the same as b'A'
, since the hexadecimal number 41 (65 in decimal) is the letter A
in ASCII. However, the Python interpreter doesn't do this by default.
How does UTF-16 work?
UTF-16 simply uses 16 bits (= 2 bytes) for every character 1.
There are however two variants of ordering the bytes, called little endian and big endian. To decode UTF-16 data, you have to know which encoding was used. Sometimes UTF-16 data starts with a Byte Order Mark (BOM), which is a special character that can be used to determine the byte ordering.
Your Python string b'\x14\xfeh\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00'
consists of 24 bytes, so 12 UTF-16 characters.
I guess your first byte is corrupted somehow, because it results in a strange character. It probably should have been \xFF
instead of \x14
, because when the data starts with the two bytes \xff\xfe
, this is a signal that the bytes are stored in Little Endian format. (See this table on Wikipedia).
Finally, decoding the data in Python is very simple:
b'\xff\xfeh\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00'.decode('utf-16')
output:
'hello world'
1 This is not entirely true, because some special characters are actually represented using a combination of two UTF-16 characters, but you should probably ignore that for now. For (much) more information about UTF-16, see Wikipedia.