How can I ignore unicode values and and keep the format '\x##'?

Question

I'm working on creating packets to send to a device serially. I want to keep the formatting as I've typed it without having it converted to unicode characters.

my_thing = b'\xb4\x75'
print(my_thing)
(Actual Output)>>> b'\xb4u'
(Wanted Output)>>> b'\xb4\x75'

Probably a simple question but after googling and searching other questions I couldn't find what I was looking for. Thanks in advance!

`my_thing = r'\xb4\x75'`? (Untested: I have only access to Python 2 atm, where the result is likely different - [link](https://docs.python.org/3/whatsnew/3.0.html) says "All backslashes in raw string literals are interpreted literally.") — Leporello, Jun 12 '19 at 15:14
@Leporello this totally changes the actual content of the object, not just in Python 2, but in everything... `my_thing` is a bytes object, of length 2. your suggestion produces a `str` of length 8 — Adam.Er8, Jun 12 '19 at 15:18
@Adam.Er8 Fair enough. The question might have more to do with how `print` behave on bytes objects, but I will refrain from speculating. — Leporello, Jun 12 '19 at 15:30
@Leporello I assumed so because of the serial communication part, but you're right, it's just a speculation as well :( — Adam.Er8, Jun 12 '19 at 15:33
b'\x75' == b'u' They are equivalent in python. I can write a lookup function to lookup the unicode character codes and print them as a string for referencing, but I'm hoping there's some easy way that I'm overlooking. — Tristen Harr, Jun 12 '19 at 15:36
I posted an answer, tried using the builtin `bytes.hex()`, I hope this helps — Adam.Er8, Jun 12 '19 at 15:44

Adam.Er8 · Answer 1 · 2019-06-13T04:27:15.363

1

Assuming this is only a representation/print issue, I'll suggest a custom print function:

my_thing = b'\xb4\x75'

def print_bytes_as_hex(b):
    print("b'{}'".format(''.join(r'\x{:02x}'.format(i) for i in b)))

print_bytes_as_hex(my_thing)

if you copy/eval this function's output, it should == it's input.

NOTE: the string it generates is not == to the input.

edited Jun 13 '19 at 04:27

answered Jun 12 '19 at 15:39

Adam.Er8

12,675
3
26
38

1

When you iterate over a byte string, you iterate over integers. So you can avoid the range-step-2 thing and use advanced string formatting instead: `''.join(r'\x{:02x}'.format(i) for i in b)` (inside the first `.format()`). – lenz Jun 12 '19 at 21:32
thanks @lenz, awesome and way simpler. I'm kinda embarrassed I didn't realize that myself :P – Adam.Er8 Jun 13 '19 at 04:28

jsbueno · Answer 2 · 2019-06-13T21:03:01.330

When you type something like thing = b"\xb4", what is stored in Python the code object, after compiled, is just the actual number 0xb4 (180 in decimal). This has nothing to do with Unicode at all - actually, the separation of bytes and text-strings in Python 3 was done exactly so that bytes values are one thing and text is another, and one needs an "encoding" to relate one to the other.

Your value would be just the same if you'd do:

In [155]: thing = bytes([180])                                                                                                                         

In [156]: thing                                                                                                                                        
Out[156]: b'\xb4'

This only becomes a character in Python 3 if converted to a string via an explicit encoding:

In [157]: print(thing.decode("latin1"))                                                                                                                
´

What happens is that for some similarity with Python 2 and C language itself, the byte values that happen to be mapped to the [32, 128] range - are printed as ASCII characters. So, 0x75 corresponds to the character ASCII u - but the internal representation of both numbers in my_thing = b'\xb4\x75' is still one byte numeric value for each - no matter what their representation with print is. And when you send this bytes object in a binary packet, both numbers 0xb4 and 0x75 will be sent as numeric values.

This is easy to verify if you either iterate through the bytes-string, which yields numeric values in the [0, 256] range - or write the values to a file and check that it actually only contains 2 bytes:

In [158]: my_thing = b"\xb4\x75"                                                                                                                       

In [159]: my_thing                                                                                                                                     
Out[159]: b'\xb4u'

In [160]: list(my_thing)                                                                                                                               
Out[160]: [180, 117]

In [161]: open("test.bin", "wb").write(my_thing)                                                                                                       
Out[161]: 2

In [162]: !ls -la test.bin                                                                                                                             
-rw-rw-r--. 1 gwidion gwidion 2 Jun 13 17:46 test.bin

(the "2" in the last line in this listing is the byte-size for the file as spelled by Linux shell's "ls")

So, the only problem you are having, if any, is to visualize your values on the Python side, before they are sent - for that purpose then, of viewing the values in the host console, either as a print on the TTY, or displaied in a GUI or generated web-page, you do the oposite thing you think is taking place: Call a method that gives you a text object representing the HEX digits of your bytes object as text - so it can be easily inspected. The bytes object itself has the .hex() method that fits this purpose:

In [165]: my_thing.hex()                                                                                                                               
Out[165]: 'b475'

So, there you are - the hexdigits for the 2 bytes you are ready to send as a packet, viewed as text - while the contents of my_thing itself are unmodified.

This does not have the "\xYY" prefix, so it is nicer to look at - and, if you will type a lot of values, there is also a bytes method that will convert each pair of hex-digits into a byte - and is much more convenient for typing literal values. The bytes.fromhex class method. It allows you to type:

In [166]:my_thing = bytes.fromhex("b475")

And this is equivalent to b"\xb4\x75".

If for some reason you really need the \x prefix to each pair of digits, a one-liner of Python code can manipulate it to generate a string containing a byte-string literal that could be fed to eval, for example - but using bytes.fromhex would still be more readable:

converter = lambda seq: 'b"{}"'.format("".join(f"\\x{value:x}" for value in seq))

And on the interactive session:

In [168]: my_thing = b"\xb4\x75" 
     ...:  
     ...: converter = lambda seq: 'b"{}"'.format("".join(f"\\x{value:x}" for value in seq)) 
     ...: print(converter(my_thing))                                                                                                                   
b"\xb4\x75"

This is just a printout of a text string - a sequence of text characters including the character "b", '"', and so on. To have the bytes literal back one needs to apply eval on that:

In [169]: eval(converter(my_thing))                                                                                                                    
Out[169]: b'\xb4u'

How can I ignore unicode values and and keep the format '\x##'?

2 Answers2