0

I have a website and Unity project that communicate with one another through a web server using web sockets. I am encoding/decoding the messages I am sending using json. On the Unity side, I am using Newtonsoft for json and websocketsharp for WebSockets. Messages send fine and everything is working, but now I am trying to implement emojis in Unity to display correctly. I was able to create a sprite sheet of all emojis, create a dictionary with the key's being their Unicode and values being their position in the sprite sheet. The issue is that when I receive an emoji (for example the emoji Unicode: U+1F910), Unity receives it as "\uD83E\uDD10". Is there a way to send the emoji as a string literal of its Unicode? If not is there a way to parse the c# interpreted Unicode back to the original Unicode? I have found regex which converts more common symbols from the above format back to the corresponding symbol but does not give me back the Unicode as a string. Here is what I am currently using to do that:

var result = Regex.Replace(
            arrivedMessages[0],
                @"\\[Uu]([0-9A-Fa-f]{4})",
                m => char.ToString(
                (char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));

With the above code, if the user were to send a symbol such as º, the decoded json will read \u00ba, but the above regex will convert it back to º. When I try to send an emoji, such as the symbol, the json will read "\ud83e\udd10" and the regex result will be blank. Is there an issue with the regex? Or is there a better way to go about doing this? Thanks!

Edit:

To simplify the overall question: Is there a way to convert "\uD83E\uDD10" back to a string literal of the Unicode "U+1F910"

TEEBQNE
  • 6,104
  • 3
  • 20
  • 37
  • Javascript uses UTF-16 internally, so these two values are simply the two surrogates for the U+1F910 codepoint. – Mr Lister Mar 09 '19 at 12:52
  • @MrLister Thanks! Was unaware. I am able to convert the surrogate pars back to the unicode, but the issue I am having now is getting working regex to pull the surrogate pairs and replace them. Would you have any idea on how to manage that? Thanks! – TEEBQNE Mar 09 '19 at 21:54

1 Answers1

0

Here is the function I ended up using to convert the surrogate pairs as @Mr Lister pointed out:

        string returnValue = "";

        for (var i = 0; i < SurrogatePairString.Length; i += char.IsSurrogatePair(SurrogatePairString, i) ? 2 : 1)
        {
            var codepoint = char.ConvertToUtf32(SurrogatePairString, i);

            // keep it uppercase for the regex, then when it is found, .ToLower()
            returnValue = String.Format("U+{0:X4}", codepoint);
        }
TEEBQNE
  • 6,104
  • 3
  • 20
  • 37