0

I am facing a business requirement for the French language that conjugation must be supported. For example, if the user searches for "Être" then it should also find variations of the form of the verb (voice, mood, tense, etc).

Based on what I have seen, Azure Search fr.microsoft analyzer (or custom analyzer built-on top of this) supports it. I have verified this by searching for "Être" and finding documents with: est, EST, sera, sont and etre.

It does not, however, find documents with the following: ete, etes, Ete, Etes.

I searched and found this page which documents the simple and compound forms of Être. http://conjugator.reverso.net/conjugation-french-verb-%C3%AAtre.html

It does not look like the Microsoft French language analyzer supports all of them. Is this true? If so, then how do I ensure all are handled? Do I need to add "ete" and "etes" as synonyms for "Être"? If so, would I also need to add "Ete" and "Etes" as synonyms for "Être" as well?

Is there a way for me to get documentation on all the French conjugation support in Azure Search?

Last but not least, how do I better understand ALL the conjugation for "Être"? I tried using the Analyzer API...

{ "analyzer": "fr.microsoft",   "text": "Être" }

But I only get the following responses:

{
  "@odata.context": "https://one-adscope-search-poc2.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult",
  "tokens": [
    {
      "token": "etre",
      "startOffset": 0,
      "endOffset": 4,
      "position": 0
    },
    {
      "token": "être",
      "startOffset": 0,
      "endOffset": 4,
      "position": 0
    }
  ]
}

1 Answers1

0

In Azure Search, our linguistic analyzers use normalized forms to match different conjugations of the word. For example, at indexing time, the Microsoft analyzer analyzes the word 'sont' to 'etre' and indexes both the original and the normalized/lemmatized form of the word. At query time, say you are issuing a search query with 'est'. The word 'est' also analyzes to 'etre' and finds the document containing 'sont'. The responses from the Analyze API you shared align with this expectation.

Unfortunately, we don't provide exhaustive list of conjugations in our documentation. You may be able to generate the list using a sample of your documents and using the response from the Analyze API.

Finally, you can use our synonyms feature to fill in the missing gap. I noticed that the words that are not matching(ete, etes, Ete, Etes) all analyze to the baseform 'ete'. You can define a synonym rule that says 'etre' and 'ete' are equivalent. The synonyms feature is currently in private preview. Feel free to reach out to me at nateko AT microsoft if you want to try out

Hope this helps.

Nate

Nate Ko
  • 923
  • 5
  • 7
  • Thank you. I am still a little confused why synonyms are considered preview when we are currently using synonyms in our custom analyzers via the .Net SDK. Our custom analyzer uses the Microsoft.Azure.Search.Models.SynonymTokenFilter as documented about 3/4 of the way down on this page. https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search We are using API version 2016-09-01. – Andres Becerra Apr 17 '17 at 17:07
  • Never mind Nate... I found an email from you from March that describes why the .NET SDK 2016-09-01 synonym support is not sufficient. You wrote "The SynonymTokenFilter has a compatibility issue with some language tokenizers. The dictionary to SynonymMapFilter (provided with synonyms is parsed and analyzed with standard analyzers whereas in your example, the query was parsed/analyzed with a language specific tokenizer. If the tokenized forms of a word from the two analysis are different, the query will not expand as expected. ......." – Andres Becerra Apr 17 '17 at 17:31
  • "..... There’s also an issue with multiword synonyms in SynonymsTokenFilter in custom analysis which is handled correctly in the new synonyms feature. As the new feature works by expanding the query, no re-indexing is needed. Further, unlike SynonymTokenFilter in custom analysis, updating the synonyms dictionary does not interrupt the service. " – Andres Becerra Apr 17 '17 at 17:31
  • Azure Search supports synonyms in custom analysis and via the new synonyms feature described in this documentation https://learn.microsoft.com/en-us/azure/search/search-synonyms. The latter is easier to maintain and deploy and do not have the issues you mentioned. – Nate Ko Apr 19 '17 at 17:00
  • I have a further update, just to help anybody else reading this. Nate and Janusz helped show me that current synonym mapping works, but changes require an index rebuild. I verified this works and am looking forward to the next release where we can maintain these synonym mappings on the fly. – Andres Becerra Apr 26 '17 at 04:26