2

I have a string in hex:

Hex = 'E388854083969497A4A38599408881A2409985829696A38584408699969440814082A48783888583924B'

As a byte object it looks like this:

b'\xe3\x88\x85@'b'\xe3\x88\x85@\x83\x96\x94\x97\xa4'b'\xe3\x88\x85@'b'\xe3\x88\x85@\x83\x96\x94\x97\xa4'b'\xe3\x88\x85@\x83'b'\xe3\x88'b'\xe3\x88\x85@\x83\x96\x94\x97\xa4'

In EBCDIC it is this:

The computer has rebooted from a bugcheck.

So I know that hex 40 (x40) is a 'space' in EBCDIC and its a '@' in ASCII

I can't figure why python, when printing the byte objects, prints '@' instead of '\x40'

my test code sample is:

import codecs
Hex = 'E388854083969497A4A38599408881A2409985829696A38584408699969440814082A48783888583924B'

output = []
DDF = [4,9,4,9,5,2,9]
distance = 0

# This breaks my hex string into chunks based off the list 'DDF'
for x in DDF:
    output.append(Hex[distance:x*2+distance])
    distance += x*2

#This prints out the list of hex strings
for x in output:
    print(x)

#This prints out they byte objects in the list
for x in output:
    x = codecs.decode(x, "hex")
    print(x)

#The next line print the correct text
Hex = codecs.decode(Hex, "hex")
print(codecs.decode(Hex, 'cp1140'))

The Output of the above is :

E3888540
83969497A4A3859940
8881A240
9985829696A3858440
8699969440
8140
82A48783888583924B
b'\xe3\x88\x85@'
b'\x83\x96\x94\x97\xa4\xa3\x85\x99@'
b'\x88\x81\xa2@'
b'\x99\x85\x82\x96\x96\xa3\x85\x84@'
b'\x86\x99\x96\x94@'
b'\x81@'
b'\x82\xa4\x87\x83\x88\x85\x83\x92K'
The computer has rebooted from a bugcheck.

So I guess my question is how can I get python to print the byte object as 'x40' instead of '@'

Thank you so much for your help :)

Michael Nolan
  • 25
  • 1
  • 6

2 Answers2

3

I think your byte array is slightly off.

According to this, you need to use 'cp500' for decoding, example:

my_string_in_hex = 'E388854083969497A4A38599408881A2409985829696A38584408699969440814082A48783888583924B'
my_bytes = bytearray.fromhex(my_string_in_hex)
print(my_bytes)

my_string = my_bytes.decode('cp500')
print(my_string)

output:

bytearray(b'\xe3\x88\x85@\x83\x96\x94\x97\xa4\xa3\x85\x99@\x88\x81\xa2@\x99\x85\x82\x96\x96\xa3\x85\x84@\x86\x99\x96\x94@\x81@\x82\xa4\x87\x83\x88\x85\x83\x92K')
The computer has rebooted from a bugcheck.

When you print the bytearray, it will still print a '@', however it is actuall \x40 "under the covers". This is just the __repr__() of the object. As this method is not taking any "decode" parameter to decode it properly, it just creates a "readable" string for printing purposes.

__repr__() or repr() is "just that"; it is only a "representation of the object" not the actual object. This does not mean it is actually a '@'. I just uses that character when printing. It is still a bytearray, not a string.

When decoding it will properly decode, using the code-page selected.

Edwin van Mierlo
  • 2,398
  • 1
  • 10
  • 19
  • Thank you for the link to __repr__(). I just did a test printing out every hex value's byte object. All of the ones that have an ascii printable char were represented by the ascii char. and the non printable ones were displayed by their byte format. – Michael Nolan Feb 27 '18 at 16:23
  • 1
    As far as the cp500 vs cp1140 goes, I just have to go look at some ancient server. I think its actually going to be cp37 but I don't know yet. – Michael Nolan Feb 27 '18 at 16:24
  • @Keanwood yes, I am sure you can find a codec to decode properly, I chose cp500 as an example. You need to make the choice of what codec to use. – Edwin van Mierlo Feb 27 '18 at 16:28
1

Python always tries to first decode hex as a printable (read: ASCII) character when printing via print(). If you need a full hex string printed use binascii.hexlify():

Hex = 'E388854083969497A4A38599408881A2409985829696A38584408699969440814082A48783888583924B'

binascii.hexlify(codecs.decode(Hex,'hex'))

>>>> b'e388854083969497a4a38599408881a2409985829696a38584408699969440814082a48783888583924b'
cowbert
  • 3,212
  • 2
  • 25
  • 34
  • Do you know, why python only tries to decode hex 40 and hex 4B? oh actually that makes sense. Ill go double check the my ASCII/EBCIDEC chart. maybe those were the only two ASCII printable chars in that sample – Michael Nolan Feb 27 '18 at 16:06
  • 1
    I just checked. It looks like hex 40 and hex 4B are the only values that had ascii 0-127 values that were printable. – Michael Nolan Feb 27 '18 at 16:19