0

I am trying to insert a unicode hyphen-minus character into a text string. I am seeing an "Invalid universal character" error with the following:

u+002D (hyphen-minus)

[textViewContent insertString:@"\u002D" atIndex:cursorPosition.location];

However, these work fine:

u+2212 (minus)

[textViewContent insertString:@"\u2212" atIndex:cursorPosition.location];

u+2010 (hyphen)

[textViewContent insertString:@"\u2010" atIndex:cursorPosition.location];

I've poked at several of the existing Unicode discussions here, but I have not found one that explains what is different amongst my examples that causes the first one to error. Insight greatly appreciated.

DenVog
  • 4,226
  • 3
  • 43
  • 72

1 Answers1

2

Unversal character names have some restrictions on their use. In C99 and C++98 you were not allowed to use one that referred to a character in the basic character set (which includes U+002D).

C++11 has updated this requirement so if you are inside a string or character literal then you are allowed to use a UCN that refers to basic characters. Depending on the compiler version you're using I would guess that you could use Objective-C++11 to make your code legal.

That said, since this character is part of ASCII and the basic character set, why don't you just write it literally?

@"-"
bames53
  • 86,085
  • 15
  • 179
  • 244
  • Thank you for the reply and background. Perhaps I am mistaken, but I thought there was some risk trying to write the character literally if the end user has a language or keyboard selected that isn't Roman (e.g. Chinese, Hebrew, etc.) That's why I was trying to go the Unicode route. – DenVog Jun 21 '12 at 23:09
  • There may be such problems for some characters. But this character is ASCII and part of the basic character set. – bames53 Jun 21 '12 at 23:26
  • Actually, I guess I repeated myself there. But the reason being part of the basic character set matters takes a bit to explain. To keep it short, almost any problem you'd have writing a character from the basic character set literally in a character or string literal, you would also encounter with UCNs. The only way UCNs might work when a literal `-` doesn't is if your compiler expects a non-ascii input file encoding. And in that case you're going to have bigger problems than this. – bames53 Jun 21 '12 at 23:39
  • For the full story you'd have to read about how your compiler converts from physical source file characters to the source character set, and how character and string literals are converted to the various execution charsets. You'd also need to know how the execution charset matters when a user runs the program, and how the user's environment affects how the program runs and how its output is displayed. – bames53 Jun 21 '12 at 23:47
  • @DenVog Whether the user's keyboard can enter the character has no bearing on how the compiler interprets the input. [This answer](http://askubuntu.com/a/20976) gives a little insight, but the short answer is it's supposed let different *programmers* refer to the same variable as `føø` and `f\u00f8\u00f8`. In Java, the compiler has to parse unicode escapes *before* tokenization, so you can do crazy things like `/* comment "\u002a/ evil code /\u002a" comment */` which some IDEs parse incorrectly. To make it simpler for computers/humans to parse, C99 doesn't allow these error-prone `\u` escapes. – tc. Apr 23 '13 at 18:26