-1

Similar to this other question on decoding a hex string, I have some code in a Python 2.7 script which has worked for years. I'm now trying to convert that script to Python 3.

OK, I apologize for not posting a complete question initially. I hope this clarifies the situation.

The issue is that I'm trying to convert an older Python 2.7 script to Python 3.8. For the most part the conversion has gone ok, but I am having issues converting the following code:

# get Register Stings
RegString = ""
for i in range(length):
    if regs[start+i]!=0:
        RegString = RegString + str(format(regs[start+i],'x').decode('hex'))

Here are some suppodrting data:

regs[start+0] = 20341
regs[start+1] = 29762

I think that my Python 2.7 code is converting these to HEX as "4f75" and "7442", respectively. And then to the characters "Ou" and "tB", respectively.

In Python 3 I get this error:

'str' object has no attribute 'decode'

My goal is to modify my Python 3 code so that the script will generate the same results.

RDK
  • 355
  • 2
  • 7
  • 24
  • It'd be a lot more helpful if we knew the type of values in `regs`, and even better, you included a sample value and the expected output. – Martijn Pieters Jan 11 '20 at 16:00
  • As a starting point, the equivalent of Python 2 `""` in Python 3 is `b""`. Python 3 `""` is Python 2 `u""`. – chepner Jan 11 '20 at 16:09
  • @chepner: it's never that simple. Depending on the context, `""` in Python 2 should be translated to `""` in Python 3 too. That's because the APIs all changed too, not just the unicode vs bytes string object types. – Martijn Pieters Jan 11 '20 at 16:15
  • I've closed this question as it is not clear enough. I'm leaving my answer up for now, but I've made assumptions that I'm not sure are actually correct. Please provide the necessary input sample and expected output (just capture some example value in the running Python 2 script with `print start, length, repr(regs), repr(RegString)` so we get accurate object representations). If you know what the values are supposed to represent (data from the Windows registry, data received from some source that represents text or an image, etc.) that'd be helpful too. – Martijn Pieters Jan 11 '20 at 16:22
  • That's a usage issue, not a type issue. Recognizing what types are actually being used is the first step to figuring which types *should* be used. – chepner Jan 11 '20 at 16:22
  • I will study your answer and comments. If I'm still confused I will repost/edit the question with supplementary data. – RDK Jan 11 '20 at 16:35

1 Answers1

0

str(format(regs[start+i],'x').decode('hex')) is a very verbose and round-about way of turning the non-zero integer values in regs[start:start + length] into individual characters of a bytestring (str in Python 2 should really be seen as a sequence of bytes). It first converts an integer value into a hexadecimal representation (a string), decodes that hexadecimal string to a (series) of string characters, then calls str() on the result (redundantly, the value is already a string). Assuming that the values in regs are integers in the range 0-255 (or even 0-127), in Python 2 this should really have been using the chr() function.

If you want to preserve the loop use chr() (to get a str string value) or if you need a binary value, use bytes([...]). So:

RegString = ""
for codepoint in regs[start:start + length]:
    RegString += chr(codepoint)

or

RegString = b""
for codepoint in regs[start:start + length]:
    RegString += bytes([codepoint])

Since this is actually converting a sequence of integers, you can just pass the whole lot to bytes() and filter out the zeros as you go:

# only take non-zero values
RegString = bytes(b for b in regs[start:start + length] if b)

or remove the nulls afterwards:

RegString = bytes(regs[start:start + length]).replace(b"\x00", b"")

If that's still supposed to be a string and not a bytes value, you can then decode it, with whatever encoding is appropriate (ASCII if the integers are in the range 0-127, or a more specific codec otherwise, in Python 2 this code produced a bytestring so look for other hints in the code as to what encoding they might have been using).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343