13

PHP allows Unicode identifiers for variables, functions, classes and constants anyhow. It was certainly intended for localized applications. Wether it's a good idea to code an API in anything but English is debatable, but it's undisputed that some development settings could demand it.

 $Schüssel = new Müsli(T_FRÜCHTE);

But PHP allows more than just \p{L} for identifiers. You can use virtually any Unicode character, except those from the ASCII range (e.g. : is special or \ as that's already used as internal hack to support namespaces.)
Anyway, you could do so, and I would even consider that a workable use for fun projects:

 throw new ಠ_ಠ("told you about the disk space before");

But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?

For example I'm pondering this for embedding parameters into magic method names. In my case I only need to inject numeric parameters, so would get away with just the underscore:

 $what->substr_0_50->ascii("text");
  // (Let's skip the evilness discussion this time. Not quite sure
  // yet if I really want it, but the conciseness might make sense.)

But if I wanted to embed other textual parameters, I would require another unicode character. Now that's harder to type, but if there's one that would aid readability and convey the meaning ... ?

 ->substr✉0✉50->   // doesn't look good

So, the question in this case: Which symbol makes sense as separator for mixed-in parameters in a virtual function name. -- Broader meta topic: Which uses of Unicode identifiers do you know about, or would you consider okayish?

tshepang
  • 12,111
  • 21
  • 91
  • 136
mario
  • 144,265
  • 20
  • 237
  • 291

2 Answers2

22

Just to make it clear: PHP does not support Unicode. And it doesn't support Unicode labels. To be more precise PHP defines a LABEL as [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*. As you can see here, it allows only a small range of characters apart from the typical alphanumeric + underscore. The fact that your Unicode labels are still accepted is only an artifact from the fact, that PHP doesn't have Unicode support. Your special characters are several bytes long in UTF-8 and PHP treats each of these bytes as a separate character and accidentally - with the characters you tried - each of them matched with the \x7f-\xff range mentioned above.

Further reading on that topic: Exotic names for methods, constants, variables and fields - Bug or Feature?

Community
  • 1
  • 1
NikiC
  • 100,734
  • 37
  • 191
  • 225
  • Haha okay, PHP5 only allows them accidentially. It's oblivious to wether a character sequence is valid UTF8 or just an L1 byte. The 0x80 byte range is included for future compat.. – mario Mar 18 '11 at 23:57
  • 3
    It's not "accidental" in that UTF-8 encodes all characters after U+007F as multiple bytes in the 0x80-0xFF range. – tripleee Nov 01 '14 at 07:07
5

Which symbol makes sense as separator for mixed-in parameters in a virtual function name.

\u2639?

But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?

The biggest hurdle after font support is going to be making the character one that can be typed. Outside of a macro or copy/paste, unicode characters are not spectacularly easy to enter. Forcing this upon others is very likely going to violate the "assume the people that work with your code after you are murderous psychopaths that know where you live" rule.

We use unicode characters in only a few comments in our codebase, like

// Even though this is the end of the file and we should get an implicit exit, 
// if we don't actually expressly exit here, PHP segfaults.
// ♫ Oh, PHP, I love you. ♫

I think that falls into the "amusement and decorative" category. Or the "shoot self in head after slaughtering the php-internals team" category. Pick one.

Anyway, this is not a good idea because it's going to make your code hard to modify.

Charles
  • 50,943
  • 13
  • 104
  • 142
  • I guess that's the actual deal breaker. If you depend on autocomplete driven development or need copy'n'paste for extension, any supposed readability advantage might pale in comparison. – mario Mar 19 '11 at 00:07