Extra \N{...} when using kCFStringTransformToUnicodeName or NSStringTransformToUnicodeName

Question

let string = "\u{00A0}" // no-break space
let transformed = string.stringByApplyingTransform(NSStringTransformToUnicodeName, reverse: false)

Expected result: NO-BREAK SPACE

Actual result: \N{NO_BREAK_SPACE}

Why the extra \N{ and }? What are they for, and is there any way to remove them, short of regex/scanning/parsing/etc?

AliSoftware · Accepted Answer · 2015-10-31T13:42:57.167

1

That's the way ICU & Unicode represent named code points in Regular Expressions. So I'm not surprised by that output at all.

Here is a link that reference this syntax at unicode.org.

That's also explained in this other page at ICU Project.

PS: \N{} is actually the shorter equivalent to \p{name=…} — as explained in that unicode.org page above that linked anchor). You can see similar syntaxes like in regular-expressions.info that mention that \p{…} syntax for defining Unicode CodePoints using their properties.

edited Oct 31 '15 at 13:42

answered Oct 31 '15 at 00:48

AliSoftware

32,623
6
82
77

Thanks for the answer. I was going to strip those characters out, but now that I know that they’re official unicode code point names, I’ll leave them in. (This is for printing out a readable debug version of a string that may contain special characters that are hard to read in a monospace font.) – Zev Eisenberg Oct 31 '15 at 01:07

Extra \N{...} when using kCFStringTransformToUnicodeName or NSStringTransformToUnicodeName

1 Answers1