2

I am using Lucene.net and am trying to implement a SynonymFilter to provide expanded terms when items within my database of products can be named differently, or spelled differently - e.g. "spanner" > "wrench", or "lawnmower" > "lawn mower".

As a test I setup a SynonymMap as follows :

String base1 = "lawnmower";
String syn1 = "lawn mower";
String base2 = "spanner";
String syn2 = "wrench";

SynonymMap.Builder sb = new SynonymMap.Builder(true);
sb.Add(new CharsRef(base1), new CharsRef(syn1), true);
sb.Add(new CharsRef(base2), new CharsRef(syn2), true);
SynonymMap smap = sb.Build();

Searching for "spanner" or "wrench" brings back all terms with either word in. Searching for "lawn mower" or "lawnmower" only brings back terms that match exactly the input search criteria.

Is there something else that needs to done for multiple word phrases within the Synonyms?

Also how do I expand to say 3 or more terms for for example "lawnmower", "lawn mower", "mower", "grass cutter"?

Thanks

chilluk
  • 217
  • 2
  • 17

1 Answers1

2

There is an example of multi-word synonyms in the unit tests. You have to split the words yourself and insert a SynonymMap.WORD_SEPARATOR (null character) between them. To make this easier, there is a Join method on SynonymMap.Builder.

String base1 = "lawnmower";
String syn1 = "lawn mower";

SynonymMap.Builder sb = new SynonymMap.Builder(true);
CharsRef syn1Chars = sb.Join(Regex.Split(syn1, " +"), new CharsRef());
sb.Add(new CharsRef(base1), syn1Chars, true);
SynonymMap smap = sb.Build();

Here is an extension method to make quick work of this.

public static class SynonymMapBuilderExtensions
{
    private static Regex Space = new Regex(" +", RegexOptions.Compiled);

    public static void AddPhrase(this SynonymMap.Builder builder, string input, 
        string output, bool keepOrig)
    {
        CharsRef outputRef = builder.Join(Space.Split(output), new CharsRef());
        builder.Add(new CharsRef(input), outputRef, keepOrig);
    }
}

You can then use this extension method whether the synonym has spaces or not, and you don't have to bother with creating the CharsRef objects if you don't need them anywhere else in your code.

String base1 = "lawnmower";
String syn1 = "lawn mower";
String base2 = "spanner";
String syn2 = "wrench";

SynonymMap.Builder sb = new SynonymMap.Builder(true);
sb.AddPhrase(base1, syn1, true);
sb.AddPhrase(base2, syn2, true);
SynonymMap smap = sb.Build();
NightOwl888
  • 55,572
  • 24
  • 139
  • 212
  • Thanks that seems to work an absolute treat! When I want 3 or more terms to all map to each other how do I expand to 3 or more terms for example "lawnmower", "lawn mower", "mower", "grass cutter"? Do I have to map each variation to each other? – chilluk Nov 14 '17 at 13:54
  • Not sure, but it seems reasonable that is the only way it could work. You could make adding all combinations simpler by adding all synonyms to a list and then using [this combinations extension method](https://stackoverflow.com/a/32479803/) to add all of the mappings. – NightOwl888 Nov 14 '17 at 15:46
  • So I will have to add a > b, b > a, a > c, c > a, b > c and c > b? Am I looking to do this in both the built index and the incoming query? In my source data I could have different variations of the term, and obviously I cannot predict how people will search for it. Or is it good enough to only process the query to look for all the alternate terms? Do I retain the original value when adding the synonym? I can't "see" what is being created to know what is going on under the hood so I can work out best approach. – chilluk Nov 15 '17 at 11:37
  • I am afraid I don't know enough about how it works to give you an answer, although I have read somewhere that you can store the synonyms at index time for better performance. I suggest asking a question on the [lucene mailing list](https://lucene.apache.org/core/discussion.html) (not Lucene.Net) and be sure to inform them you are using Lucene 4.8. Unfortunately, the [documentation](https://lucene.apache.org/core/4_8_0/analyzers-common/index.html) on the subject is quite scant, but there are a few articles on using synonyms that come up when searching for "lucene multiple synonyms". – NightOwl888 Nov 15 '17 at 12:54