0

I am making an advanced search page on my website using PHP prepared statements to query a MySQL database MyISAM table using a Boolean Full-Text search.

It is very much like the first section of the Advanced Search provided by Google and covers the following criteria:

  • All of these words
  • Exact word or statement (NB: This is where the problem comes in)
  • Any of these words
  • None of these words

I retrieve each input value, clean and process each part of the collection of strings, then append the appropriate information to form the mySQL query via the prepared statement.

So in essence for the following search:

  • All -
  • Exact -
  • Any - seagate toshiba
  • None -

Would output as this string:

seagate* toshiba*

The query would result in something like this:

SELECT id, description
FROM `items` 
WHERE MATCH (description)
AGAINST ('seagate* toshiba*' IN BOOLEAN MODE)

Which would list all rows with the words "seagate" followed by anything and "toshiba" followed by anything in the description field.

This works fine, as does an output of:

-(750gb*) -(320gb*) seagate* toshiba*

Which would list all rows as above but exclude any rows with "750gb" and "320gb" in the description field.

By adding a value to the "All of these words" string we would get an output of:

+(16mb*) +(7200rpm*) -(750gb*) -(320gb*) seagate* toshiba*

Which would list all rows as above but show only downs that included both "16mb" and "7200rpm" in the description field.

Now for the problematic part. If i were to make use of the "Exact word of statement" string and add the value "serial ata 600" we would get an output of:

+(16mb*) +(7200rpm*) -(750gb*) -(320gb*) +("serial ata 600") seagate* toshiba*

Running this string and resulting query through as a sql query using phpmyadmin I get a result set of 2 rows matching the criteria for the search.

However, when running this on my website i get a result of 6 rows which would indicate that +("serial ata 600")" is being ignored entirely.

If i enter only a value for the string "Exact word of statement" so that we would get an output of:

+("serial ata 600")

The results would indicate that this string would list all rows that contained either "serial" or "ata" or "600".

By running the same query directly in mysql this result would list all rows that contained exactly the words "serial ata 600".

In the MySQL definition of this operator it states:

A phrase that is enclosed within double quote (“"”) characters matches 
only rows that contain the phrase literally, as it was typed.

This is the case in MySQL but when running the same query with PHP as a Prepared Statement returns a different result set.

Here is the the prepared statement:

if ($result = $link->prepare("
    SELECT id, description
    FROM `items` 
    WHERE MATCH (description)
    AGAINST (? IN BOOLEAN MODE)
"))
{
    $result->bind_param("s", $pattern);
    ... ETC
}

Here is the output of $pattern directly before this:

+("serial ata 600")

Could anyone possibly suggest the reason for this behavior as i do not see any reason for there to be any difference in the way things work here between PHP and MySQL.

I can provide any additional code relating to how the string is generated on request but the output is as it is in my example.

Any suggestions/advice/input/feedback or comment would be greatly appreciated.

Community
  • 1
  • 1
Craig van Tonder
  • 7,497
  • 18
  • 64
  • 109

1 Answers1

2

This is one place where prepared statements fall flat on their faces. Internally, the preparation engine will be doing the equivalent of:

$quoted = mysql_real_escape_string('+("serial ata 600")');

which gives you the equivalent of

+(\"serial ata 600\")

Now you're not using a 3-word quote phrase anymore, you're sending in the following SEPARATE words:

+("serial

ata

600")

This is because the " quotes are SQL metacharacters, and you NEED them to be treated as metacharacters. However, because they ARE metacharacters, the prep engine will quote them, reducing them to normal plain-jane quotes, and now they're no longer enclosing your search phrase. They've become PART of the search phrase.

Don't know if this would actually work, but you may have to rewrite the prepared statement to be more like

... MATCH AGAINST (CONCAT('("', ?, '")'))
Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Hi Marc, thanks so much for your input! It actually does seem to work that way but its more complicated to implement. In doing so though, you got me on the right track as far as chain of thought goes and i was able to solve the issue although i am not quite sure how. I changed the coalition to UTF-8 bin and converted all the values to uppercase which is more uniform in any case. Now however, it seems that my query works without using concat! Thanks for your help!!! – Craig van Tonder May 22 '13 at 00:49
  • `the equivalent of: $quoted = mysql_real_escape_string('+(\"serial ata 600\")');` I had actually been "cleaning" the input text using `mysqli_real_escape_string()` which produced the result as you has described, however before output of `$pattern` the result was being cleaned again, producing the result as i has described. The double quotes must be escaped as you described in your example of: `+(\"serial ata 600\")`. Without the escaping, the result will return 3 separate values as you have also suggested. Thanks so much for the better understanding! – Craig van Tonder May 22 '13 at 01:40