0

I faced with the issue when I try to search for several words including a special character (section sign "§"). Example: AB § 32. I want all words "AB", "32" and symbol "§" to be included in found documents. In some cases document can be found, in some not. If my document contains the following text then search finds it: Lagrum: 32 § 1 mom. första stycket a) kommunalskattelagen (1928:370) AB

But if document contains this text then search doesn't find: Lagrum: 32 § 1 mom. första stycket AB

For symbol "§" I use UT8-encoding "\xc2\xa7".

Index uses "lucene.swedish" analyzer.

      "Content": [
        {
          "analyzer": "lucene.swedish",
          "minGrams": 4,
          "tokenization": "nGram",
          "type": "autocomplete"
        },
        {
          "analyzer": "lucene.swedish",
          "type": "string"
        }
      ]

Query looks like:

{
    "index": "test_index",
    "compound": {
        "filter": [
            {
                "text": {
                    "query": [
                        "111111111111"
                    ],
                    "path": "ProductId"
                }
            },
        ],
        "must": [
            {
                "autocomplete": {
                    "query": [
                        "AB"
                    ],
                    "path": "Content"
                }
            },
            {
                "autocomplete": {
                    "query": [
                        "\xc2\xa7",
                    ],
                    "path": "Content"
                }
            },
            {
                "autocomplete": {
                    "query": [
                        "32"
                    ],
                    "path": "Content"
                }
            }
        ],
    },
    "count": {
        "type": "lowerBound",
        "threshold": 500
    }
}

The question is what is wrong with the search and how can I get a correct result (return both above mentioned documents) ?

Yelena
  • 1
  • 1

1 Answers1

0

Focusing only on the content field, here is an index definition that should work for your requirements. The docs are here. Let me know if this works for you.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "content": [
        {
          "type": "autocomplete",
          "tokenization": "nGram",
          "minGrams": 4,
          "maxGrams": 7,
          "foldDiacritics": false,
          "analyzer": "lucene.whitespace"
        },
        {
          "analyzer": "lucene.swedish",
          "type": "string"
        }
      ]
    }
  }
}
Nice-Guy
  • 1,457
  • 11
  • 20