2

In a system we will be using, there is a function called "uses". If you are familiar with pascal, the uses clause is where you tell your program what dependencies it has (similar to C and PHP includes). This function is being used in order to further control file inclusion other than include(_once) or require(_once).

As part of testing procedures, I need to write a dependency visualization tool for statically loaded files.

Statically Loaded Example: uses('core/core.php','core/security.php');

Dynamically Loaded Example: uses('exts/database.'.$driver.'.php');

I need to filter out dynamic load cases because the code is tested statically, not while running.

This is the code I'm using at this time:

$inuses=false;   // whether currently in uses function or not
$uses=array();   // holds dependencies (line=>file)
$tknbuf=array(); // last token
foreach(token_get_all(file_get_contents($file)) as $token){
    // detect uses function
    if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true;
    // detect uses argument (dependency file)
    if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token;
    // detect the end of uses function
    if($inuses && is_string($token) && $token==')'){
        $inuses=false;
        isset($uses[$tknbuf[2]])
            ? $uses[$tknbuf[2]][]=$tknbuf[1]
            : $uses[$tknbuf[2]]=array($tknbuf[1]);
    }
    // a new argument (dependency) is found
    if($inuses && is_string($token) && $token==',')
        isset($uses[$tknbuf[2]])
            ? $uses[$tknbuf[2]][]=$tknbuf[1]
            : $uses[$tknbuf[2]]=array($tknbuf[1]);
}

Note: It may help to know that I'm using a state engine to detect the arguments.

My issue? Since there are all sorts of arguments that can go in the function, it is very difficult getting it right. Maybe I'm not using the right approach, however, I'm pretty sure using token_get_all is the best in this case. So maybe the issue is my state engine which really isn't that good. I might be missing the easy way out, thought I'd get some peer review off it.

Edit: I took the approach of explaining what I'm doing this time, but not exactly what I want. Put in simple words, I need to get an array of the arguments being passed to a function named "uses". The thing is I'm a bit specific about the arguments; I only need an array of straight strings, no dynamic code at all (constants, variables, function calls...).

Christian
  • 27,509
  • 17
  • 111
  • 155
  • 1
    Might I ask why not just use autoloading of classes? – Mchl Nov 22 '10 at 08:29
  • Really, forget about the regex=evil meme. This **is** a use case for them. – mario Nov 22 '10 at 08:35
  • @Mchl - Because this does not specifically concern classes. @Mario - Admittedly, I'm not so good with regexes. Either case, I regex to parse PHP code would be hard to create and maintain as well as quite slow to run. – Christian Nov 22 '10 at 08:36
  • Even if you parse the tokens, you will probably have to take short cuts. For example, `uses('foo'.'bar.php')` is static, but it's not a simple string constant. Also, `uses(foo($a,'bar.php'))` could be problematic because you might pick up `bar.php`. You'd have to write some sort of recursive (or equivalent) algorithm that understood the PHP grammar. And if you're going to take short cuts on that, I think a regexp is your best option... – Matthew Nov 22 '10 at 08:53
  • @konforce - I consider the first version as dynamic. Ever set values for property definitions? like: `class a { public $a="123"; }`. You can use "." or any other operator, not even variables. As to regexp, I'd use it over the function itself, but the problem is I'm working on whole files, thus the regexp would have to understand all of the PHP grammar. I'd rather use the current tokens than write my own parser and use those tokens. – Christian Nov 22 '10 at 08:58
  • My point is simply that you are imposing rules over which subset of the PHP language your parser will work based on your knowledge of how the `uses` function is called. Thus, if you're okay with a non-perfect solution, it will be much more trivial to write such a solution with regexp. I'll provide an answer for you to compare. I agree that a parsed tokens solution is "cleaner," but I think you're prematurely writing off the alternatives. – Matthew Nov 22 '10 at 09:14
  • 1
    @konforce - I'm just saying that having to write whole PHP parser in regexp is really difficult. Don't forget that in the case of regexp, one has to take care of `uses()` showing up as HTML, comments or inside strings. – Christian Nov 22 '10 at 09:19
  • Commented code is a valid concern, the other ones I think not so much (for this particular problem). But if you can get a parser working as you would like it (and it sounds like you have), then I'd definitely go that route. – Matthew Nov 22 '10 at 09:23

2 Answers2

1

OK I got it working. Just some minor fixes to the state engine. In short, argument tokens are buffered instead of put in the uses array directly. Next, at each ',' or ')' I check if the token is valid or not and add it to the uses array.

$inuses=false;   // whether currently in uses function or not
$uses=array();   // holds dependencies (line=>file)
$tknbuf=array(); // last token
$tknbad=false;   // whether last token is good or not
foreach(token_get_all(file_get_contents($file)) as $token){
    // detect uses function
    if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true;
    // token found, put it in buffer
    if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token;
    // end-of-function found check buffer and throw into $uses
    if($inuses && is_string($token) && $token==')'){
        $inuses=false;
        if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]])
                ? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1]
                : $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]);
        $tknbuf=array(); $tknbad=false;
    }
    // end-of-argument check token and add to $uses
    if($inuses && is_string($token) && $token==','){
        if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]])
            ? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1]
            : $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]);
        $tknbuf=array(); $tknbad=false;
    }
    // if current token is not an a simple string, flag all tokens as bad
    if($inuses && is_array($token) && $token[0]!=T_CONSTANT_ENCAPSED_STRING)$tknbad=true;
}

Edit: Actually it is still faulty (a different issue though). But the new idea I've had ought to work out nicely.

Christian
  • 27,509
  • 17
  • 111
  • 155
  • Once in a `uses` function, I'd repeat: a) read token, b) if token == '(', add one to counter. c) if counter > 0: if ')', then decrease counter else ignore. d) if counter == 0: if ')' done. if ',' start over; else add token to list. If the current list (after step d) is nothing but a single constant string, you can add it to the dependencies. Again, not fool proof, but probably good enough. – Matthew Nov 22 '10 at 09:41
1

Using regular expressions:

<?php
preg_match_all('/uses\s*\((.+)\s*\)/',
  file_get_contents('uses.php'), $matches, PREG_SET_ORDER);

foreach ($matches as $set) {
  list($full, $match) = $set;

  echo "$full\n";

  // try to remove function arguments
  $new = $match;
  do {
    $match = $new;
    $new = preg_replace('/\([^()]*\)/', '', $match);
  } while ($new != $match);

  // iterate over each of the uses() args
  foreach (explode(',', $match) as $arg) {
    $arg = trim($arg);
    if (($arg[0] == "'" || $arg[0] == '"') && substr($arg,-1) == $arg[0])
    echo "  ".substr($arg,1,-1)."\n";
  }
}
?>

Running against:

uses('bar.php', 'test.php', $foo->bar());
uses(bar('test.php'), 'file.php');
uses(bar(foo('a','b','c')), zed());

Yields:

uses('bar.php', 'test.php', $foo->bar())
  bar.php
  test.php
uses(bar('test.php'), 'file.php')
  file.php
uses(bar(foo('a','b','c')), zed())

Obviously it has limitations and assumptions, but if you know how the code is called, it could be sufficient.

Matthew
  • 47,584
  • 11
  • 86
  • 98