I have problems getting synonyms with more than one term to work.
To illustrate my problem, I have created a minimal index with four items describing hotels, loosely based on the hotels-example from the Azure Cognitive Search documentation.
{
"value": [
{
"Id": "1",
"Title": "Fancy stay, luxury, hotel, wifi, break fast"
},
{
"Id": "2",
"Title": "Roach Motel, budget, motel, internet, morning meal"
},
{
"Id": "3",
"Title": "Mediocre Inn, cheap, bed & breakfast, wi-fi, breakfast"
},
{
"Id": "4",
"Title": "Ok Stay, cost efficient, bed and breakfast, wi fi, breakfast"
}
]
}
Each hotel item describes the same types of amenities but in an unnormalized way. As an example, they all describe that they have internet, but they use different terms in content:
- wifi
- internet
- wi-fi
- wi fi
Users searching for hotels will be equally unnormalized. We want to enable users to return all of the above as matches when they use any of the above search terms.
We can submit a synonym map to do this:
{
"format": "solr",
"synonyms": "wifi,wi-fi,internet,wi fi"
}
Synonyms defined with commas as separators are two-way synonyms. This means any of the terms will be equivalent to any of the other terms. Except wi fi, which does not work as expected because it's more than one token.
QUERIES
- wifi: returns all 4, as expected
- internet: returns all 4, as expected
- wi-fi: returns all 4, as expected
- wi fi: returns only 2 hits (the ones with wi-fi and wi fi)
I understand that the problem is that a query consisting of wi fi is two separate tokens. Unexpectedly, synonym lookup does not transform wi fi as expected.
WORKAROUND
A known workaround is to change the query to a phrase-query, so it becomes "wi fi".
- "wi fi": returns all 4 hits, as expected
However, the end-user query may consist of multiple terms, like
hotel affordable wi fi breakfast
So, I cannot wrap the entire query in quotes as it would not match anything. Can anyone suggest a workaround to get the built-in synonym functionality to work for this use case? It's not hard to see that many similar examples require synonyms with multiple terms to work.
- affordable, cost efficient, cheap
- break fast, breakfast, morning meal
- ...
PS: We are using the SDK to index content. We have extensive pre-processing of content, using regular C# to manipulate the content and data model as we wish. The same goes for the front end, where we manipulate the query using code we control.
Any creative suggestions are welcome.