2

I need to convert SQL with old way or inserting parameters into the query into a new way where parameters are replaced by question marks (?) and are passed separately to to the query handler - see examples of "old" and "new" below.

I have on the order or 1200 of such SQL statements with various parameters and various number of parameters, and I would like to convert them all to the new away.

Is this something I have to create a custom parser for or are there tools that will allow me to easily do a bulk conversion?

Non-parameterized Queries (aka Old)

$product = "widget";
$price = 10.00;
$sql = "SELECT description
    FROM resource.product
    WHERE
        product.model = '" . db_input($product) . "'
        and product.price = '" . db_input($price) . "'
    ";
$result = db_query($sql);

Parameterized Queries (aka New)

$product = "widget";
$price = 10.00;
$sql = "SELECT description
    FROM resource.product
    WHERE
        product.model = ?
        and product.price = ?
    ";
$result = db_param_query($sql, [$product, $price]);

Note that two blocks differ in the bottom 4 lines.

Dennis
  • 7,907
  • 11
  • 65
  • 115

1 Answers1

1

What you need is a Program Transformation System (PTS). A PTS is a tool that can parse source code to compiler data structures (e.g., abstract syntax trees or ASTs), can apply transformations to the AST that represent desired changes, and can then re-generate valid source code from the modified ASTs.

A good PTS will allow you to specify the langauge to be transformed using a grammar, and will allow you to code tree modifications with source-to-source rewrite rules, that are essentially of the form:

**when** you see *this*, replace it by *that*, **if** condition(*this*)

where this and that are patterns written using the syntax of the language being transformed, and condition can inspect the matched pattern for additional constraints.

In OP's case, I'm guessing he is using PHP (telltales: "$" as a prefix on variable names, "." used for concatenation operator). So he'll need a good PTS and an accurate grammar for PHP.

In OP's, he has a double grammar problem: he want to transform not only the PHP code that glues the fragments of the SQL strings together, but he also wants to modify the SQL strings themselves. Arguably he needs the PTS to also parse the SQL string fragments, and then apply a transform that modifies both the PHP and the SQL strings at the same time. If we make the assumption that the SQL strings are always assembled by the legacy program by concatenating string fragments which always represent SQL chunks between parameters, then we can avoid this double-parsing problem.

The second probem is knowing that a string represents SQL string fragments. Consider the following code:

  $A=1; $B=10;
  echo  "SELECT number from '" . $A . "' to '" . $B . "'";

This looks very much like a real select statement, but it is not; we don't want to apply any transforms to this code. In general, you can't know that an assembled string is really a SQL string or just something that looks like one. We'll assume that all concatenated strings that respectively terminate and start with "'" are SQL strings.

Our DMS Software Reengineering Toolkit is a PTS that can solve this problem; it even has an available PHP grammar. One needs roughly the following DMS rewrite rule:

rule fix_legacy_SQL_parameter_passing(s1: STRING, v, DOLLARVAR, s2:STRING):
          expression -> expression=
 " \s1 . db_input(\v) . \s2 " 
 -> " \concatenate3\(\allbutlastcharacter\(\s1\)\,"?"\
                     \allbutfirstcharacter\(\s2\)"
    if last_character_is(s1,"'") and first_character_is(s2,"'");

The rule is named fix_legacy_SQL_parameter_passing to allow us to distinguish it from many other rules we may have. The parameters s1 and s2 represent metavariables that match subtrees of the specified (non)terminal type. expression->expression tells DMS that the rule applies to expressions only.

The this pattern is " \s1 . db_input(\v) . \s2 "; the " is a metaquote that separates DMS rewrite rule syntax from PHP syntax. \s1, \v and \s2 use \ to indicate a metavariable, the pattern in effect says "if you can find a concatenation of two literal strings with an intervening dbinput function having a variable name as an argument then..."

Following the second -> is the that pattern; it is pretty complicated because we want do some some computation on the matched strings. To do so, it uses metafunctions written as

\fnname\( arg1 \,  arg2 \, ...  \)

to compute a new tree from the trees bound to the pattern variables by a match. Note the \ escapes to distinguish the eleements of the metafunction call from the syntax of the target language. I hope the purpose of the set of metafunctions I suggest using is clear; they have to be coded as custom auxiliary support for this rule. The rule ends with a trailing ";".

It should be clear that this rule patches the SQL string to replace the squotes by a "?" in the constructed string.

But, wait, oops... we didn't collect the db_input variables.

We can do this two ways: a hidden accumulator (not shown here becuase it would look like magic), or a clumsier but easier to rewrite tag.

A tag for DMS is a tree that contains whatever we want it to contain; it usually indicates that we have an intention to do further work, and we'll need additional rewrite rules to do that work. First we introduce the definition of a tag tree:

 pattern accumulated_db_variable( vars:expression, computed:expression) :expression = TAG;

This makes accumulated_db_variable such a tag, with two children, the first intended to be a list of variable names, and the second an arbitrary expression.

Now we revise our rule above:

rule fix_legacy_SQL_parameter_passing(s1: STRING, v, DOLLARVAR, s2:STRING):
          expression -> expression=
 " \s1 . db_input(\v) . \s2 " 
 -> " \accumulated_dbinputs\([\v]\, 
                             \concatenate3\(\allbutlastcharacter\(\s1\)\,"?"\
                                            \allbutfirstcharacter\(\s2\)\)"
    if last_character_is(s1,"'") and first_character_is(s2,"'");

This rule computes the revised SQL string, but also computes the set of dbinput variables found in that string, and wraps this pair of trees in tag. The bad news is now we have tags in the middle of the expression; but, we can write additional rules to get rid of them by combining the tags when they are close to one another:

 rule merge_accumulated_dbinputs(vars: element_list,
                                 v: DOLLARVAR,
                                 e: expression):
     expression -> expression =
  " \accumulated_dbinputs\([\vars]\,
                           \accumulated_db_inputs\([\v]\,e\)\)"
   -> "\accumulated_dbinputs\([vars,v]\,\e)";

We need a rule to move the set of collected variables to the following statement as suggested by OP:

 rule finalize_accumlated_dbinputs(lhs1: DOLLARVAR,
                                    vars: element_list,
                                    query: expression,
                                    lhs2: DOLLARVAR)
     statements -> statements =
  " \lhs1 = \accumulated_dbinputs\([\vars],\query);
    \lsh2 = db_param_query(\lhs1,[\vars]);

If his code has more variability than this allows, he may have to write additional rules.

Finally, we need to glue this set of rules together and give it a name:

ruleset fix_legacy_SQL { fix_legacy_SQL_parameter_passing, merge_accumulated_dbinputs, finalize_accumlated_dbinputs }

With this, we can invoke DMS on a file and tell it to apply the ruleset until exhaustion.

What this set of rules should do [I'm showing expected output] to the OP's example is transform it through a series of steps:

$sql = "SELECT description
        FROM resource.product
        WHERE
           product.model = '" . db_input($product) . "'
           and product.price = '" . db_input($price) . "'
        ";
$result = db_query($sql);

-> ("transformed to"):

$sql =  TAG_accumulated_dbinputs([$product],
       "SELECT description
        FROM resource.product
        WHERE
           product.model = ?
           and product.price = '" . db_input($price) . "'
        ");
$result = db_query($sql);

-> ("transformed to"):

$sql =  TAG_accumulated_dbinputs([$product],
           TAG_accumulated_dbinputs([$price],
        "SELECT description
        FROM resource.product
        WHERE
           product.model = ?
           and product.price = ?
        "));
$result = db_query($sql);

-> ("transformed to"):

$sql =  TAG_accumulated_dbinputs([$product,$price],
        "SELECT description
        FROM resource.product
        WHERE
           product.model = ?
           and product.price = ?
        ");
$result = db_query($sql);

-> ("transformed to"):

$sql =  "SELECT description
        FROM resource.product
        WHERE
           product.model = ?
           and product.price = ?
        ";
$result = db_param_query($sql,[$product,$price]);

Wooof. Untested, but I think this is pretty close to right.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341