I've thrown myself into the deep end of the pool, so forgive me if i'm struggling a bit:
For background:
JavaScript strings - UTF-16 vs UCS-2?
https://mathiasbynens.be/notes/javascript-encoding
I've been looking at the following two links and it leaves me with some questions:
it appears (based on my understanding) that you can represent a 32-bit codepoint using a surrogate pair of JSON escapes something like "\uD834\uDF06"
First question: Is that accurate? Is this how you represent a 32-bit unicode codepoint in JSON (i heard javascript engines are a bit weird because the spec predates utf-16 so they might not handle surrogates as one character? but I don't want to have to care about that. i hope i don't have to)
Second question: Assuming that's accurate, is it somehow possible to create a valid surrogate pair using one JSON escape and a couple of extended characters in the same string? Should I be able to handle that in my code? What I mean is if I encounter something like "\uD834��" where � is an arbitrary value, possibly in the extended character range should I fail due to an invalid surrogate pair, or should i treat the � characters as the second half of the pair? (my characters are one byte in my code i'm doing utf8 internally so the above two extended characters would be 16 bits total)
Does that even make sense? I'm not even sure I'm asking the right questions here so forgive me. I am very new at this.
I have to know this by the way, instead of using existing libraries and stuff because i'm targeting platforms including the Arduino with my JSON library and on that platform everything is roll your own.