0

Python 3.6

I converted a string from utf8 to this:

b'\xe6\x88\x91\xe6\xb2\xa1\xe6\x9c\x89\xe7\x94\xb5@xn--ssdcsrs-2e1xt16k.com.au'

I now want that chunk of ascii back into string form, so there is no longer the little b for bytes at the beginning.

BUT I don't want it converted back to UTF8, I want that same sequence of characters that you ses above in my Python string.

How can I do so? All I can find are ways of converting bytes to string along with encoding or decoding.

Duke Dougal
  • 24,359
  • 31
  • 91
  • 123
  • Still none of the answers is accepted. Is there anything unclear in my answer? Or just forgot to click? I like to have things done and finished, so the future potential visitors see that the question was answered. – Claudio Apr 27 '17 at 11:31

2 Answers2

1

The (wrong) answer is quite simple:

chr(asciiCode)

In your special case:

myString = ""
for char in b'\xe6\x88\x91\xe6\xb2\xa1\xe6\x9c\x89\xe7\x94\xb5@xn--ssdcsrs-2e1xt16k.com.au':
    myString+=chr(char)
print(myString)

gives:

ææ²¡æçµ@xn--ssdcsrs-2e1xt16k.com.au

Maybe you are also interested in the right answer? It will probably not please you, because it says you have ALWAYS to deal with encoding/decoding ... because myString is now both UTF-8 and ASCII at the same time (exactly as it already was before you have "converted" it to ASCII).

Notice that how myString shows up when you print it will depend on the implicit encoding/decoding used by print.

In other words ...

there is NO WAY to avoid encoding/decoding

but there is a way of doing it a not explicit way.

I suppose that reading my answer provided HERE: Converting UTF-8 (in literal) to Umlaute will help you much in understanding the whole encoding/decoding thing.

Community
  • 1
  • 1
Claudio
  • 7,474
  • 3
  • 18
  • 48
0

What you have there is not ASCII, as it contains for instance the byte \xe6, which is higher than 127. It's still UTF8.

The representation of the string (with the 'b' at the start, then a ', then a '\', ...), that is ASCII. You get it with repr(yourstring). But the contents of the string that you're printing is UTF8.

But I don't think you need to turn that back into an UTF8 string, but it may depend on the rest of your code.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79
  • Each individual character is entirely valid to go into a Python string. How can I get them in there? Don't think about decoding/encoding. I don't want it decoded/encoded, I just want each character into my string. – Duke Dougal Apr 23 '17 at 09:33
  • Well, `s = "b'\\xe6\\x88\\x91\\xe6\\xb2\\xa1\\xe6\\x9c\\x89\\xe7\\x94\\xb5@xn--ssdcsrs-2e1xt16k.com.au'". Is that what you mean? There should be easier ways of achieving what you're trying to do. – RemcoGerlich Apr 23 '17 at 09:34
  • Or `s = r"b'\xe6\x88\x91\xe6\xb2\xa1\xe6\x9c\x89\xe7\x94\xb5@xn--ssdcsrs-2e1xt16k.com.au'"`, note the `r`, that makes it unnecessary to escape the backslashes. – RemcoGerlich Apr 23 '17 at 09:35
  • As you suggest @RemcoGerlich, repr() seem to get me most of the way there, now I just need to strip off the leading b and the surrounding quotes and have what I am after thankyou. – Duke Dougal Apr 23 '17 at 09:38