1

How would I convert characters which are surrogate pairs into hexadecimal?

I've found that using hex() and ord() works for characters with a single code point, such as emojis like "". For example:

print(hex(ord("")))
# '0x1f600'

Similarly, using chr() and int() works for getting the characters from the hexadecimal:

print(chr(int(0x1f600)))
# ''

However, as soon as I use a surrogate pair, such as an emoji like "", the code throws an error:

print(hex(ord("")))
TypeError: ord() expected a character, but string of length 2 found

How would I fix this, and how would I convert such hexadecimal back into a character?

  • 2
    Nothing to do with surrogate pairs; `` is an [emoji sequence](https://emojipedia.org/emoji-sequence/) `` (U+1F469, *WOMAN (0xd83d,0xdc69)*) and `` (U+1F3FB, *EMOJI MODIFIER FITZPATRICK TYPE-1-2 (0xd83c,0xdffb)*). – JosefZ Dec 19 '22 at 15:59

1 Answers1

0

Since an exact output format wasn't specified, how about:

def hexify(s):
    return s.encode('utf-32-be').hex(sep=' ', bytes_per_sep=4)

def unhexify(s):
    return bytes.fromhex(s).decode('utf-32-be')

s = hexify('')
print(s)
print(unhexify(s))

Output:

0001f469 0001f3fb

Or similar to your original code:

def hexify(s):
    return [hex(ord(c)) for c in s]

def unhexify(L):
    return ''.join([chr(int(n,16)) for n in L])

s = hexify('')
print(s)
print(unhexify(s))

Output:

['0x1f469', '0x1f3fb']

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251