I have input strings which contain text in which some characters are in UTF-16 format and escaped with '\u
'. I am trying to, in Perl, convert all the strings to UTF-8. For example, the string 'Alice & Bob & Carol'
might be formatted in the input as:
'Alice \u0026 Bob \u0026 Carol'
To do my desired conversion, I was doing...:
$str =~ s/\\u([A-Fa-f0-9]{4})/pack("U", hex($1))/eg;
...which worked fine until I got to input strings that contained UTF-16 surrogate pairs like:
'Alice \ud83d\ude06 Bob'
How do I modify the above code that uses pack
to work with UTF-16 surrogate pairs? I would really like a solution that just uses pack
without having to use any additional libraries (JSON::XS, Encode, etc.).