Get last character of string in current modern Javascript, allowing for Astral characters such as Emoji that use surrogate pairs (two code units)

Question

Unicode characters (code points) not in the Basic Multilingual Plane (BMP) may consist of two chars (code units), called a surrogate pair.

'ab' is two code units and two code points. (So two chars and two characters.)

'a' is three code units and two code points. (So three chars and two characters.)

My code does not need to work with old versions of JavaScript. ES6 or whatever is most modern.

How can I access the last character, irrespective of whether its an Astral character or not?

Splitting the string into "all but last character" and "final character" is also fine.

@Andreas: Oh god, that's both really ugly and utterly beautiful simultaneously. — T.J. Crowder, Jul 11 '17 at 15:23
@hippietrail: Andreas' solution works because the `String` iterator iterates code points, not "JavaScript characters" (code units), and so that spreads the string out into code points in an array -- then grabs the last one from the array via `pop`. — T.J. Crowder, Jul 11 '17 at 15:25
Awesome! I felt positive there would be at least one nifty way to achieve this with the new ES6 stuff. I could only think of `Array.from('a')` or `'a'.match(/.$/u)` so far though. — hippietrail, Jul 11 '17 at 15:29
@hippietrail: I'd be interested to see a performance comparison between the Andreas' spread approach and your regex approach, actually. In fact, I was so interested I did one, and rex won by a fair bit on V8 in Chrome, and by a **lot** on SpiderMonkey in Firefox: https://jsperf.com/approaches-to-getting-the-last-code-point-in-string I recommend posting it as an answer, and (with apologies to Andreas) accepting it. — T.J. Crowder, Jul 11 '17 at 15:37
There may well be other approaches none of us have thought of too. If mine is faster I suppose I should submit it as an answer after all. Mine is also easy to extend to the second part of my question too. — hippietrail, Jul 11 '17 at 15:45
@hippietrail: Yeah. In fact, you might leave the question with no accepted answer for a day or so... As you say, it's not only faster, but it's at least as elegant and flexible. — T.J. Crowder, Jul 11 '17 at 15:48
Just make sure you really mean "last character" and not something similar to last "visible symbol", as there can be a lot of characters modifying the last "visible symbol", e.g. but not limited to [variation selectors](https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)) for emoji. Also see [this question](https://stackoverflow.com/questions/44979260/filtering-empty-strings-from-a-string-containing-emojis-using-spread-syntax#44979358) for unexpected results regarding this. — ASDFGerte, Jul 11 '17 at 15:58
@ASDFGerte: Yes for the sake of this question "character" means "codepoint". I also don't care about grapheme clusters, etc. Those are all at a higher level and this question is about the lower levels. — hippietrail, Jul 11 '17 at 16:01

score 2 · Accepted Answer · edited Jul 11 '17 at 15:32

2

Spreading will dissect a string into its code points

[...'a'].pop()

edited Jul 11 '17 at 15:32

T.J. Crowder

1,031,962
187
1,923
1,875

answered Jul 11 '17 at 15:32

Andreas

21,535
7
47
56

this doesn't work in all cases try `[...'a'].pop()` – Krimson Jun 21 '23 at 17:47

score 1 · Answer 2 · edited Jul 11 '17 at 15:59

1

I knew from answers on other SO questions that both Array.from() and regular expressions with the /u flag would both correctly handle non-BMP Unicode characters, but I didn't think either was likely to be the best answer.

Maybe I was wrong, so here are two solutions:

Array.from()

let c = Array.from('a')[1];
console.log(c);

u flag

let c ='a'.match(/.$/u)[0];
console.log(c);

This second approach can be extended to answer the second part of my question too:

let [,l,r] = 'abcd'.match(/(.*)(.)/u);
console.log(l);
console.log(r);

(No anchor needed as the .* will be greedy.)

edited Jul 11 '17 at 15:59

T.J. Crowder

1,031,962
187
1,923
1,875

answered Jul 11 '17 at 15:57

hippietrail

15,848
18
99
158

The `Array.from` is very much like Andreas', both use the string iterator to get an array of code points and then take the last entry from the array. (Yours avoids subsequently mutating the array, which aids performance.) The regex is smarter, though, because it pushes as much into the internals of the JavaScript engine as possible, where it can be optimized. – T.J. Crowder Jul 11 '17 at 15:59
I tried to add the Array.from method to the jsperf but it just keeps telling me "Please review required fields and save again." and the site is very slow for me here in Laos right now. \-: – hippietrail Jul 11 '17 at 16:15
1

Not just you, not just in Laos. jsPerf is frequently very slow or even offline. – T.J. Crowder Jul 11 '17 at 16:36
1

[Regex wins](https://jsperf.com/get-last-true-character-again), `Array.from` is the slowest. (Which isn't really surprising, it's a more complex method with an optional mapping function.) – T.J. Crowder Jul 11 '17 at 16:46

Get last character of string in current modern Javascript, allowing for Astral characters such as Emoji that use surrogate pairs (two code units)

2 Answers2