0

I've got a real time index containing information on people (a definition is included below). The problem is that I'm trying to run an exact match on a phone number and email address and no matter what I try, I'm getting matches even if the database column values contains what I've searched for, not where the column value exactly matches.

The query I'm using is:

SELECT id, first_name,last_name,email_personal, phone_number, WEIGHT() as relevance FROM people WHERE MATCH('@(phone_number,email_personal) "^+447111$" "^myemail@gmail\.com$ "');

That returns rows that contains a full phone number (i.e. +44711122334), as far as I understand it, shouldn't, it should be trying to match "^+447111$" as the start & end of the field?

I've also tried this test query and have much the same issue, apart from the fact it returns a lot more matches, as it would do it was matching any of the field values containing the criteria, rather than the whole field value. The values aren't the full values I'm looking for, but this is a test as it should be matching rows that only have a phone number of "+447711" and email of "@gmail.com", which don't exist in the database, but it does return rows, where the phone number starts with +447711 and the email has @gmail.com in it.

SELECT id, first_name,last_name,email_personal,phone_number, WEIGHT() as relevance FROM people WHERE MATCH('@phone_number "^+447711$" @email_personal "^@gmail\.co$"') ORDER BY relevance DESC;

Just to confirm, I'm trying to find matches where the values of the fields match the exact text, i.e. this would be the SQL query (and yes, this doesn't work either!)

SELECT id,first_name,last_name,email_personal,phone_number FROM people WHERE phone_number = '+44711122334' AND email_personal = 'myemail@gmail.com';

Config:

index people
{
                type = rt
                path = /var/local/sphinx/indexes/ppl/

                rt_field = first_name
                rt_field = last_name
                rt_field = phone_number
                rt_field = email_personal
                stored_fields = first_name,last_name,phone_number,email_personal
                rt_mem_limit = 512M

                expand_keywords = 1
                min_prefix_len = 2
                min_word_len = 2
                index_exact_words = 1
}
Steve Childs
  • 1,832
  • 2
  • 20
  • 26

1 Answers1

0

bah! This is always the way. You spend hours trying to figure it out, post it to StackOverflow and then within a few moments the answer jumps out at you.

It turns out it was the 'expand_keywords' setting in the config that was responsible. For those who don't know, this is what it does...

Queries against indexes with expand_keywords feature enabled are internally expanded as follows. If the index was built with prefix or infix indexing enabled, every keyword gets internally replaced with a disjunction of keyword itself and a respective prefix or infix (keyword with stars). If the index was built with both stemming and index_exact_words enabled, exact form is also added. Here's an example that shows how internal expansion works when all of the above (infixes, stemming, and exact words) are combined:

running -> ( running | *running* | =running )

So that despite trying to search for exact matches, that was causing it to always expand and search for the text within the column, not that the column exactly matched.

Taking that line out of the config & restarting Sphinx solved the issue straight away, you don't even need to re-index, which is good.

I thought I'd leave the question and answer here incase anyone else has a similar "issue" ;)

Steve Childs
  • 1,832
  • 2
  • 20
  • 26