1
let string = "\u{00A0}" // no-break space
let transformed = string.stringByApplyingTransform(NSStringTransformToUnicodeName, reverse: false)

Expected result: NO-BREAK SPACE

Actual result: \N{NO_BREAK_SPACE}

Why the extra \N{ and }? What are they for, and is there any way to remove them, short of regex/scanning/parsing/etc?

Zev Eisenberg
  • 8,080
  • 5
  • 38
  • 82

1 Answers1

1

That's the way ICU & Unicode represent named code points in Regular Expressions. So I'm not surprised by that output at all.

Here is a link that reference this syntax at unicode.org.

That's also explained in this other page at ICU Project.

PS: \N{} is actually the shorter equivalent to \p{name=…} — as explained in that unicode.org page above that linked anchor). You can see similar syntaxes like in regular-expressions.info that mention that \p{…} syntax for defining Unicode CodePoints using their properties.

AliSoftware
  • 32,623
  • 6
  • 82
  • 77
  • Thanks for the answer. I was going to strip those characters out, but now that I know that they’re official unicode code point names, I’ll leave them in. (This is for printing out a readable debug version of a string that may contain special characters that are hard to read in a monospace font.) – Zev Eisenberg Oct 31 '15 at 01:07