0

I am using regexp_filter in Sphinx to replace terms

In most cases I can do so e.g. misspellings are easy:

regexp_filter = Backround => Background

Even swapping using capturing group notation:

regexp_filter = (Left)(Right) => \2\1

However I am having more trouble when using a pattern match to find a given words I want to replace:

 regexp_filter = (PatternWord1|PatternWord2)\W+(?:\w+\W+){1,6}?(SearchTerm)\b => NewSearchTerm

Where NewSearchTerm would be the term I want to replace just \2 with (leaving \1 and the rest of the pattern alone). So

So if I had text 'Pizza and Taco Parlor' then:

regexp_filter = (Pizza)\W+(?:\w+\W+){1,6}?(Parlor)\b => Store

Would convert to 'Pizza and Taco Store'

I know in this case the SearchTerm is /2 but not sure how to convert. I know I could append e.g. /2s to make it plural but how can I in fact replace it since it is just a single capturing group of several and I just want to replace that group?

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • 1
    What do you want to replace it with? Give us expected inputs => outputs. There is no problem replacing only one group if that is what you are asking. – ndnenkov Dec 26 '15 at 18:03
  • @ndn Sorry if it wasn't clear, updated answer and here in comment: regexp_filter = (PatternWord1|PatternWord2)\W+(?:\w+\W+){1,6}?(SearchTerm)\b => NewSearchTerm – user3649739 Dec 26 '15 at 19:18

1 Answers1

0

So, if I understand the question. You have a strings that match the following criteria:

  1. Begin with PattenWord1 or PatternWord2
  2. Immediately followed by an uppercase word
  3. Maybe followed by another word that is between 1 and 6 characters -- recommend using [A-z] rather than \w+\W+
  4. Followed by "SearchTerm"

Let use this as a baseline:

PatternWord1HelloSearchTerm

And you only want to replace SearchTerm from the string.

So you need another pattern group around everything you want to keep:

regexp_filter = ((PatternWord1|PatternWord2)\W+(?:\w+\W+){1,6}?)(SearchTerm)\b => \1World

Your pattern group matches would be:

  1. PatternWord1Hello
  2. PatternWord1
  3. SearchTerm

Your result would be:

PatternWord1HelloWorld

cynicaljoy
  • 2,047
  • 1
  • 18
  • 25
  • The thing is I want to keep everything but replace the SearchTerm. I'm using the rest to simply pattern recognize when I want to replace SearchTerm. As per my (newly added) example if I have ''Pizza and Taco Parlor' and want to (in this case) make Parlor equivalent to Store I make the inner group a capturing group. So basically I want to end up with \1\2NewTerm (which is currently failing) – user3649739 Dec 26 '15 at 19:27
  • You should first ensure that the expression that you're using matches the string you want. I recommend using [RegExr](http://regexr.com/) to validate that. Once you get that right, you should play with that pattern groups like I recommended. – cynicaljoy Dec 26 '15 at 19:32
  • I did in fact use Regex101 to make sure they match but your suggestion to group both the prior terms into one group (PatternWord1 Hello) into one group worked. Thanks! – user3649739 Dec 26 '15 at 19:34