1

Minimal example code:

<?php

    $avarname = 'a var value';

    function a_function_name($a_parameter = true)
    {
        // a comment
    }

    a_function_name();

Structure from using token_get_all():

T_OPEN_TAG: <?php
T_WHITESPACE:
T_VARIABLE: $avarname
T_WHITESPACE:
T_WHITESPACE:
T_CONSTANT_ENCAPSED_STRING: 'a var value'
T_WHITESPACE:
T_FUNCTION: function
T_WHITESPACE:
T_STRING: a_function_name
T_VARIABLE: $a_parameter
T_WHITESPACE:
T_WHITESPACE:
T_STRING: true
T_WHITESPACE:
T_WHITESPACE:
T_COMMENT: // a comment
T_WHITESPACE:
T_WHITESPACE:
T_STRING: a_function_name
T_WHITESPACE:

As you can see, one can detect a function definition by checking for a T_STRING, preceded by a T_WHITESPACE, preceded by a T_FUNCTION. So far, so good.

However, a function call is just a T_STRING, just like many other things, such as the "true" constant for the parameter, with no special symbol either before or after it.

How am I supposed to know if a T_STRING refers to a function name or something else when there is no symbol prior to it telling my interpreter what the next T_STRING is supposed to refer to?

If your answer will be that I need to check if a function exists with the name of the T_STRING value, does that mean that there cannot be a function called true()? Since that would conflict with the "true" constant? If I need to make such a check, it complicates things in many different ways...

PHP Joe
  • 61
  • 1
  • Well, that's just the difference between tokenizer and parser. One breaks up the code into sections. The other must look at context and relation. – mario Mar 10 '20 at 08:12

1 Answers1

2

What token_get_all actually returns is this (with already post-processed token names):

  ...,
  [26]=>
  array(3) {
    [0]=>
    string(8) "T_STRING"
    [1]=>
    string(15) "a_function_name"
    [2]=>
    int(10)
  }
  [27]=>
  string(1) "("
  [28]=>
  string(1) ")"

token_get_all only does the tokenisation, it doesn't parse the parts into a logical AST. The next step after this would be to look at how the tokens fit together and what logical units they form. Here you'd parse the three consecutive tokens "a_function_name", "(" and ")" into meaning a function call.

You may want to use an existing PHP Parser, instead of reinventing this step from scratch.

deceze
  • 510,633
  • 85
  • 743
  • 889