0

I was using regexp_filter to do replacements in a client selection that is powered by sphinx. I tried to make the regexp more efficient so instead of

regexp_filter=Dr(.)? Jones=>Doctor Louis
regexp_filter=Dr(.)? Smith=>Doctor Alban

I did

regexp_filter=(?i)Dr(.)? (Jones|Smith)=>Doctor \2

However this gives me unexpected results, specifically that 'Dr Jobes' returns all Dr X records (e.g. Dr Jones, Doctor Gleason, Dr Proctor).

As I saw it however each record in the table with Dr X would simply have indexed Dr to Doctor with the last name (\2) intact. Instead it seems to map whatever term I put in the (A|B) pipe into a generic match.

This seems inconsistent with how this approach has worked in the past, wondering if I am missing something obvious in the regexp.

BTW if I test this in myregextester it works as expected:

https://www.myregextester.com/?r=07efb8b2

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • Are you sure this is a problem with the filter, it might just be misunderstanding sphinx queries? ie a query of 'Dr Jobes', will look for any documents with those too words, in any order. Could the document for Doctor Gleason, contain "Dr" & "Jobes" somewhere else? You can also use SHOW META, to look at the actual keywords, after the query is processed to see what the regexp_filter did to the query! – barryhunter Dec 07 '16 at 18:03
  • @barryhunter Well these are single term fields not documents. So if a field is 'Dr Jones' than as far as I know MATCH('Dr Smith') will not find 'Dr Jones' even if I did regexp_filter=Dr (Smith|Jones)=>Doctor \1. I do get I'd need to do `^Dr Jones$' if I wanted to search the entire field but that would only be relevant when I was searching on 'Dr' – user3649739 Dec 07 '16 at 19:17
  • @barryhunter For what it is worth these were the original settings: `enable_star = 1`, `min_prefix_len = 1`, `mlock = 1` – user3649739 Dec 09 '16 at 22:52
  • @barryhunter I believe there is an issue in regexp_replace using 'OR' pipes. Whenever I try to do anything like `regexp_filter=Term (A|B|C)=>Term2 \1` I get different results than if I do each term discretely e.g. `Term A=>Term2 A`. In the latter case both `Term A` and Term2 A` will match any result, in the former case ONLY the remapped `Term2 A` will. – user3649739 Dec 10 '16 at 15:55

0 Answers0