0

I asked this previously as a Regex question yet the issue appears to be Sphinx as even the correct Regex set-up failed in Sphinx. So: Is there a way to convert curly apostrophes in the index to straight ones? I have tried:

  • Exceptions.text: ’ => '
  • regexp using the actual character: regexp_filter=(\w+)\’s=>\1
  • regexp using the unicode: regexp_filter=(\w+)\x{2019}s=>\1

And nothing has worked. I get that the may fail because these are text files but not sure why unicode would. In any event the is breaking things for me and I'd hate to do mysql replace on the entire database just for this.

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • Have you considered http://sphinxsearch.com/docs/current.html#conf-ignore-chars – barryhunter Mar 20 '17 at 19:45
  • @barryhunter I tried `ignore_chars = U+2019` and no luck. I'm still forced to manually remove. So far sphinx does not recognize that either as an explicit character `’` or as unicode `U+2019`. In an ideal world I'd be able to remap U+2019 to the standard `'` but at this point would just love to remove it in the index as my last option is to do a massive mysql replace for each `’` – user3649739 Mar 22 '17 at 13:32
  • If the regex is not working right, maybe ther data reaching indexer (when rexexp_filter is actully applied, si not actully proper UTF8 encoded? are you using SET NAMES in a sql_query_pre for example? – barryhunter Mar 23 '17 at 17:30

0 Answers0