2

I have always been under the impression that all php functions had to begin with [a-zA-Z].

For instance, this would work:

function a1() {
  return "Something, because I'm written properly.";
}

... while this would not:

function 1a() {
  return "Nothing, because you'll encounter an error before this function ever runs.";
}

However the character that displays as a result of rawurldecode('%E2%80%A9'), when the document displaying it has a declared content type of UTF8, can also be used to define a function.

In a text editor that does not display hidden characters, it ultimately looks the the function has been defined as function () { which can then be executed by calling (what appears to be no more than) ();

I can't paste the source code and have it still show up properly, so here are some screenshots. The first is a screenshot of what's been displayed in my browser, and the second is a screenshot of the actual source code as displayed inside my text editor (TextWrangler) with Display hidden characters turned on:

Browser:


enter image description here


Source code:


enter image description here

My question: is this intentional? Should I be able to define functions/variables with non-printing characters and still have them work flawlessly? And if so, is it documented somewhere?

I couldn't find any info about it, but I (obviously) don't know everything.

Thanks!

jerdiggity
  • 3,655
  • 1
  • 29
  • 41
  • 1
    possible duplicate of [What are the valid characters in PHP variable, method, class, etc names?](http://stackoverflow.com/questions/17973357/what-are-the-valid-characters-in-php-variable-method-class-etc-names) – Alma Do Nov 01 '13 at 07:06
  • 1
    @AlmaDo I don't think it's a duplicate, as I cannot see a direct answer there to this question – eis Nov 01 '13 at 07:09
  • Then you'd better to read that again. No offense, but there's no more clear and common answer than that, I think (because that includes common case and all limitations) – Alma Do Nov 01 '13 at 07:11
  • @AlmaDo that answer explains what is allowed and what is not. It does not explain if that is intentional or not, which was asked here. – eis Nov 01 '13 at 07:17

1 Answers1

2

From manual:

Function names follow the same rules as other labels in PHP. A valid function name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*.

As explained in the other answer linked, regular expression is applied byte-per-byte, allowing "many weird Unicode names".

Doing it that way has some side-effects like you've seen. However, I can't imagine it was the original intent of people behind PHP, it would be just a direct consequence of the way they've implemented it.

Community
  • 1
  • 1
eis
  • 51,991
  • 13
  • 150
  • 199
  • I really suggest you to re-read NikiC's post. (Hint: '_Note that this regex is applied byte-per-byte, without consideration for encoding. That's why it also [also allows many weird Unicode names](http://stackoverflow.com/questions/3417180/exotic-names-for-methods-constants-variables-and-fields-bug-or-feature)_' - as a quotation from there) – Alma Do Nov 01 '13 at 07:14
  • @AlmaDo I did. It explains well what are the current rules, but it doesn't really explain if the behaviour is *intended* or not. – eis Nov 01 '13 at 07:18
  • I don't know how to comment that. If that is defined in PHP itself - then, _yes, that's intentional_. And that post clarifies that it is _allowed in PHP_ and also shows _why_ I don't know how to explain that then if it's unclear to you. Well, we all have our opinions - so you're free to think as you wish. – Alma Do Nov 01 '13 at 07:24
  • Every implementation bug there is in PHP would also be "defined in PHP itself". The regex is intended as it is documented, but byte-for-byte -comparison without regards for character set might just as well be just an implementation bug. But ok, we'll just disagree here then. – eis Nov 01 '13 at 07:32