I'm developing a "plug 'n play" system in which individual components can registered and associated with an uploaded file using the Application GUI.
But to be really "plug 'n play" the Application must recognize the component and since each component is a class I could accomplish this by using interfaces.
But how can I validate the contents of an uploaded file searching for an specific interface?
My first thought was to use Tokenizer but this proved to me harder than I expected. A simple test component file like this:
<?php
class ValidComponent implements Serializable {
public serialize() {}
public unserialize( $serialized ) {}
}
After passed by token_get_all() resulted in:
Array
(
[0] => Array
(
[0] => T_OPEN_TAG
[1] => <?php
[2] => 1
)
[1] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 2
)
[2] => Array
(
[0] => T_CLASS
[1] => class
[2] => 3
)
[3] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 3
)
[4] => Array
(
[0] => T_STRING
[1] => ValidComponent
[2] => 3
)
[5] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 3
)
[6] => Array
(
[0] => T_IMPLEMENTS
[1] => implements
[2] => 3
)
[7] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 3
)
[8] => Array
(
[0] => T_STRING
[1] => Serializable
[2] => 3
)
[9] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 3
)
[10] => U
[11] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 3
)
[12] => Array
(
[0] => T_PUBLIC
[1] => public
[2] => 5
)
[13] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 5
)
[14] => Array
(
[0] => T_STRING
[1] => serialize
[2] => 5
)
[15] => U
[16] => U
[17] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 5
)
[18] => U
[19] => U
[20] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 5
)
[21] => Array
(
[0] => T_PUBLIC
[1] => public
[2] => 6
)
[22] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 6
)
[23] => Array
(
[0] => T_STRING
[1] => unserialize
[2] => 6
)
[24] => U
[25] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 6
)
[26] => Array
(
[0] => T_VARIABLE
[1] => $serialized
[2] => 6
)
[27] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 6
)
[28] => U
[29] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 6
)
[30] => U
[31] => U
[32] => Array
(
[0] => T_WHITESPACE
[1] =>
[2] => 6
)
[33] => U
)
Not only this is not very efficient because real components might be much bigger and result in huge arrays but I don't think it's very trustable.
I could certainly use this structure and search it recursively, looking for the name of some specific interface but this would certainly give me some false-positive if this interface name appears in anywhere of the code (comments, regular strings...).
I would like to avoid text comparison or Regular Expressions, if possible, but I don't know if it's possible to create a isolated sandbox to evaluate the uploaded file in order to use Reflection.