0

Say I just have one document in a collection

{
    _id: <whatever>,
    sound: 'Dong'
}

and a synonyms collection with only one mapping

{
    mappingType: 'explicit',
    input: ['Ding'],
    synonyms: ['Ding', 'Dong']
}

and I want to create a search index which uses those to return the one document when one queries for 'Ding' on the property sound.

In this minimal example I can just use the lucene.standard analyzer and all works perfectly (lucene.english works as well). But changing just the analyzer definitions to lucene.keyword (and custom analyszers, but there I might be making another mistake) breaks things, i.e. no document is returned. The definitions are pretty straight-forward; search index field definition

  "sound": {
    "analyzer": "lucene.keyword",
    "searchAnalyzer": "lucene.keyword",
    "type": "string"
  },

and synonyms

  "synonyms": [
    {
      "analyzer": "lucene.keyword",
      "name": "synonym_mapping",
      "source": {
        "collection": "synonyms"
      }
    }
  ]

Using MongoDB Compass to explain the query, I can see that for lucene.standard and lucene.english the explain looks slightly different (type: "DefaultQuery" and "queryType": "SafeTermAutomatonQueryWrapper" sounds like a wrapper for synonyms is used, maybe?) than for the not-working analyzers (type: "TermQuery"), but there is no documentation on what everything means.

At this point, my best guess is that either some analyzers are not supposed to work with synonyms (I couldn't find anything in the docs though, no error or warning either obviously), or the implementation to handle that case is missing.

Am I doing something wrong?

oli
  • 659
  • 1
  • 6
  • 18
  • Yes, not all analizers support synonyms. The list of not-supported analizers is documented at https://www.mongodb.com/docs/atlas/atlas-search/synonyms/#options The keyword one is quite special. You might have discovered undocumented feature. In any case it's Atlas proprietary feature, you will be better off by contacting support as they can investigate exact setup and provide tailored recommendations. Would be nice to have the solution posted here as an answer tho. – Alex Blex Mar 08 '23 at 14:25
  • Thank you for your comment. I read that part of the documentation as well, but there is nothing regarding lucene.keyword, nor my custom analyzers (for example just tokenizer: standard and nothing else). I contacted support as well. If I have an answer before anybody else I will post it as well. – oli Mar 08 '23 at 14:36

1 Answers1

0

I think I somewhat understand the behavior now. The following starts with the use-case of the question with the lucene.keyword analyzer. What I think happens is the following:

  1. Query for sound: 'Ding'
  2. 'Ding' is converted to lowercase; this is the extra important step and contrary to lucene.keyword behavior, and synonyms are looked up for 'ding'
  3. 'ding' synonyms was not found, search returns no results

So if I change my synonyms to

{
    mappingType: 'explicit',
    input: ['ding'],
    synonyms: ['Ding', 'Dong']
}

I can find documents with 'Ding' or 'Dong', but here the case matters again, because that is lucene.keyword behavior.

I guess it somewhat makes sense, because I read that lucene (always?) parses queries to lowercase, but since this conflict with the behavior of lucene.keyword this is pretty confusing, to me anyway. lucene.standard and similar is not affected, because they ignore case anyway when they look something up.

What I will use in the end is a custom analyzer which behaves like a case-insenstive lucene.keyword, since I don't care about the case but want to match multi-word-queries otherwise, and use lowercase synonyms.

oli
  • 659
  • 1
  • 6
  • 18