1

String.fromCodePoint(...[127482, 127480]) gives me a flag of the US ().

How do I turn the flag back to [127482, 127480]?

Sebastian Simon
  • 18,263
  • 7
  • 55
  • 75
ppt
  • 946
  • 8
  • 18
  • 4
    `Array.from("", (codeUnit) => codeUnit.codePointAt())`? That’s basically the inverse operation… – Sebastian Simon May 21 '21 at 07:47
  • @SebastianSimon - I hadn't thought of `Array.from` and its mapping callback, good call, that's got to be the closest inverse. But note that `codeUnit` isn't an accurate name, since it will be a string containing the full Unicode code *point*, not just a code unit. – T.J. Crowder May 21 '21 at 08:09
  • 1
    @T.J.Crowder The method name `codePointAt` always confused me… is a code point the number or the “character” (i.e. a string; or is that a grapheme (cluster) or glyph?)? `codePoint.codePointAt()` sounds like I want to get the code point of the code point, which doesn’t really make sense… or should that be `(string) => string.codePointAt()`? – Sebastian Simon May 21 '21 at 08:13
  • 1
    @SebastianSimon - You're not alone there. :-) Yes, I'd go with that last one (in fact, I did when I added your approach to the answer below :-D). A code point is a number that uniquely identifies a "character" in the Unicode standard ("character" is very loose there). A code *unit* is a number that may need to be combined with another number to identify a character, depending on the [transformation format](https://www.unicode.org/faq/utf_bom.html) being used. The flag in this question is a particularly complicated example because it's a two-codepoint thing that identifies an... – T.J. Crowder May 21 '21 at 08:37
  • 1
    ...emoji, so I like to use the winking face () as an example instead: It's code point 0x1F609, character [U+1F609](https://util.unicode.org/UnicodeJsps/character.jsp?a=1F609) in the Unicode database. JavaScript strings are effectively UTF-16 (but tolerating broken surrogate pairs), a 16-bit *transformation format* for Unicode where each value fits in 16 bits. Since Unicode is a 21-bit format, that means some "characters" have to use two 16-bit units -- code *units*. In the case of the winking face, those are 0xD83D and 0xDE09 -- a *surrogate pair* that, in UTF-16, combine to give us the... – T.J. Crowder May 21 '21 at 08:37
  • 1
    ...winking face (). In UTF-8, there could be anywhere from 1 to 4 code units required to make up a code point (for the winking face it's 0xF0 0x9F 0x98 0x89). I have a short-ish blog post about this [here](https://thenewtoys.dev/blog/2021/01/26/what-is-a-string/), and I also go into it in detail in Chapter 10 of my recent book (links in my profile). (Apologies for comment length!) – T.J. Crowder May 21 '21 at 08:38

1 Answers1

5

You're looking for codePointAt, perhaps using spread (etc.) to convert back to array and then mapping each of them.

console.log(theString.codePointAt(0)); // 127482
console.log(theString.codePointAt(2)); // 127480
// Note −−−−−−−−−−−−−−−−−−−−−−−−−−^
// It's 2 because the first code point in the string occupies two code *units*

or

const array = [...theString].map(s => s.codePointAt(0));
console.log(array); // [127482, 127480]

or skipping an interim step as Sebastian Simon pointed out via Array.from and its mapping callback:

const array = Array.from(theString, s => s.codePointAt(0));
console.log(array); // [127482, 127480]

Example:

const theString = String.fromCodePoint(...[127482, 127480]);

console.log(theString.codePointAt(0)); // 127482
console.log(theString.codePointAt(2)); // 127480

const array = [...theString].map(s => s.codePointAt(0));
console.log(array);  // [127482, 127480]

const array2 = Array.from(theString, s => s.codePointAt(0));
console.log(array2); // [127482, 127480]

Spread and Array.from both work by using the strings iterator, which works by code points, not code units like most string methods do.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875