0

I am trying to match a phrase in a document which unfortunately in the catalog I have is formatted like this:

Includes sides?** *No*

or

Includes sides?** *Yes*

searching for either one fails e.g.

Search idx_test where MATCH('"Includes sides?** *Yes*"')

If I search on

Search idx_test where MATCH('"Includes sides"')

it does not fail (but finds both)

and if I as a test change the data itself to:

Includes sides No

I can find it with Sphinx

Search idx_test where MATCH('"Includes sides No"')

So clearly the ? and the * need to be escaped. Yet nothing I've done works e.g using \*

user3649739
  • 1,829
  • 2
  • 18
  • 28

2 Answers2

0

Try.,.

...where MATCH('"Includes sides\\? \\*Yes\\*"')

Need multiple, one as you in a sql string, the second for the extended syntax query.

If you also writing it as a string in language like php, might need more quotes, to escape in php string.

Can also just remove them from query, they only syntax in querying so don't need to change data.

(Alough slightly more complicated if using phrase_boundary)

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • Starting here: `where MATCH('"Includes sides\\?"')` this does not return `work from home?` but it does return any `work from home` followed by a character e.g. `work from home_`. – user3649739 Nov 13 '16 at 15:30
  • In terms of removing from the query I did try `where MATCH('"Includes sides Yes"')` assuming they were never even indexed but that is a fail too. Hard to solve since in theory I could just do `Includes sides?` if it were any character BUT ? – user3649739 Nov 13 '16 at 15:31
  • For now I just did `"Includes Sides" !(("Includes sides" NEAR/2 "No") ("Includes sides" >> "No"))`. – user3649739 Nov 21 '16 at 17:20
0

Well I tried a # of solutions that did not work:

  1. I added regexp_ in the config to map ?=>qmark and *=>asterisk. I had no luck with this regardless of how I searched i.e. Includes sidesqmark and Includes sides qmark (latter assuming ? treated as separate work

  2. I commented the regexp back out and tried searching on Includes Sides No on the logic that ? and * are not being indexed so those spaces don't exist.

  3. I added spaced between Includes sides and No assuming that one or more of the chars was not indexing but somehow taking up a char space e.g. Includes sides No.

  4. I added double escapes as was suggested to me `"Includes sides\?\*\* \*No" which failed too.

None of these worked.

I tried doing NEAR/ and >> together and that worked:

("Includes sides" NEAR/2 "No") ("Includes sides" >> "No")

which seemed clunky.

Finally I did this which worked

"Includes sides \\*"No"

Which works but is a total mystery to me why though I am posting it since it solves the problem. I'm hoping someone else can explain the mechanisms behind this:

  • Clearly the \\ is the right way to escape in sphinxql
  • Clearly escaping the asterisk is working
  • Clearly escaping the ? is not
  • Yet if the asterisk needs to be escaped then how come:

"Includes sides\\?\\*\\* \\*No

does not work AND

"Includes sides\\*\\* \\*No

does not work? If it is not indexing/recognizing the ? but clearly is the * then I'd need to escape the two after Includes. Yet it is as if neither the ? nor the * after Includes exist BUT the asterisk before No does.

Sorry to include a question an answer yet my solution worked. If there is a better way for me to both provide the solution I used and the question that it revealed please let me know as I'm trying not to confuse the issue here.

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • Did you check if using `phrase_boundary`? if you have `?` in there, it would complicate phrase matches. Also check if ? and * are in charset_table. In your case would still recommend them NOT being in charset_table. – barryhunter Nov 14 '16 at 14:00