0

I made a basic regexp_filter in Sphinx to remove apostrophes:

regexp_filter=(\w+)\'s=>\1

However discovered it is not working on curly apostrophes e.g.

books’s

even if I do

(\w+)[\'’]s

Because sphinx uses a vanilla text editor for it's configuration and appears to not differentiate. In other words while the above regex will work in any regex editor it is not being recognized when it parses in the sphinx configuration file as regexp_filter.

Is there some special character I can use instead in the regex/regexp? I'd prefer to do that vs a global database replace naturally.

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • Try `(\w+)[\'’]s` – Wiktor Stribiżew Mar 19 '17 at 19:34
  • @WiktorStribiżew Sorry Wiktor I should elaborate (will update the question): I can do so in regex editors but I am using a sphinx configuration file which seems to interpret that as straight ' regardless. I'm wondering if there is a way to use the ascii for the curly instead looking into that now. – user3649739 Mar 19 '17 at 19:41
  • 1
    Try to use its code. Something like `\u2019` or `\x{2019}` should be used instead of a literal `’` – Wiktor Stribiżew Mar 19 '17 at 19:44
  • @WiktorStribiżew The second one is perfect `(\w+)\x{2019}`. Do you want to post as an answer so I can accept it? Thank you! – user3649739 Mar 19 '17 at 19:49
  • @WiktorStribiżew Dang! I spoke to soon. The character is right and works in regex editor, Sphinx RegexP still doesn't recognize it. – user3649739 Mar 19 '17 at 19:58

0 Answers0