How to convert surrogate pairs into hexadecimal, and vice-versa in Python?

Question

How would I convert characters which are surrogate pairs into hexadecimal?

I've found that using hex() and ord() works for characters with a single code point, such as emojis like "". For example:

print(hex(ord("")))
# '0x1f600'

Similarly, using chr() and int() works for getting the characters from the hexadecimal:

print(chr(int(0x1f600)))
# ''

However, as soon as I use a surrogate pair, such as an emoji like "", the code throws an error:

print(hex(ord("")))
TypeError: ord() expected a character, but string of length 2 found

How would I fix this, and how would I convert such hexadecimal back into a character?

Nothing to do with surrogate pairs; `` is an [emoji sequence](https://emojipedia.org/emoji-sequence/) `` (U+1F469, *WOMAN (0xd83d,0xdc69)*) and `` (U+1F3FB, *EMOJI MODIFIER FITZPATRICK TYPE-1-2 (0xd83c,0xdffb)*). — JosefZ, Dec 19 '22 at 15:59

score 0 · Accepted Answer · answered Dec 19 '22 at 23:51

Since an exact output format wasn't specified, how about:

def hexify(s):
    return s.encode('utf-32-be').hex(sep=' ', bytes_per_sep=4)

def unhexify(s):
    return bytes.fromhex(s).decode('utf-32-be')

s = hexify('')
print(s)
print(unhexify(s))

Output:

0001f469 0001f3fb

Or similar to your original code:

def hexify(s):
    return [hex(ord(c)) for c in s]

def unhexify(L):
    return ''.join([chr(int(n,16)) for n in L])

s = hexify('')
print(s)
print(unhexify(s))

Output:

['0x1f469', '0x1f3fb']

How to convert surrogate pairs into hexadecimal, and vice-versa in Python?

1 Answers1