4

We are using elastic search for faster searching on our organization data . The data model has organization id, address, organization name, business start date and organization contacts array .
We have asked to perform string contains search and string starts with search on organization id and/or organization name field For example, organization.name:”abc*” or organization.id:”abc

organization.name:”abc*” and organization.id:”*abc*”
organization.name:”*abc*” and organization.id:”abc*”
Since we need to use both on same field using Ngram analyzer is not working Please advise
arupc
  • 356
  • 1
  • 3
  • 12

1 Answers1

2

As far as I can understand, you need to find those documents, where organization.name begins with abc AND organization.id contains abc (not in the beginning).

For this, you can use multi-field, which is useful to index the same field in different ways for different purposes along with n-gram tokenizer

Adding a working example with index data, mapping, search query, and search result

Index Mapping:

    {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    },
    "max_ngram_diff": 20
  },
  "mappings": {
    "properties": {
      "organization": {
        "properties": {
          "name": {
            "type": "keyword",
            "fields": {
              "raw": {
                "type": "text",
                "analyzer": "my_analyzer"
              }
            }
          },
          "id": {
            "type": "keyword",
            "fields": {
              "raw": {
                "type": "text",
                "analyzer": "my_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

Index Data:

{
  "organization": {
    "id": "abc def",
    "name": "Aspect abc Technology"
  }
}
{
  "organization": {
    "id": "defabc",
    "name": "abc Aspect Technology"
  }
}
{
  "organization": {
    "id": "abcef",
    "name": "abc Aspect Technology"
  }
}
{
  "organization": {
    "id": "abc",
    "name": "Aspect Technology"
  }
}

Search Query:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "match": {
                  "organization.id.raw": "abc"
                }
              },
              {
                "prefix": {
                  "organization.name": "abc"
                }
              }
            ],
            "must_not": {
              "prefix": {
                "organization.id": "abc"
              }
            }
          }
        },
        {
          "bool": {
            "must": [
              {
                "prefix": {
                  "organization.id": "abc"
                }
              },
              {
                "match": {
                  "organization.name.raw": "abc"
                }
              }
            ],
            "must_not": {
              "prefix": {
                "organization.name": "abc"
              }
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "65054994",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.3590312,
        "_source": {
          "organization": {
            "id": "abc def",
            "name": "Aspect abc Technology"
          }
        }
      },
      {
        "_index": "65054994",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.0725547,
        "_source": {
          "organization": {
            "id": "defabc",
            "name": "abc Aspect Technology"
          }
        }
      }
    ]
ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • @arupc did you get a chance to go through my answer, looking forward to get feedback from you – ESCoder Nov 29 '20 at 04:51
  • 1
    Thanks much, this answer sounds promising , will check this out and confirm. As per your explanation here, I believe it will work out. – arupc Nov 29 '20 at 04:55
  • Please note, organization.name begins with abc AND organization.id contains abc (not in the beginning) - this could be vice versa. This means , organization.id begins with abc AND organization.name contains abc (not in the beginning) – arupc Nov 29 '20 at 04:58
  • Yes @arupc, I have included both the cases in the above query – ESCoder Nov 29 '20 at 05:00
  • Thanks . Just thinking out loud , can we solve it using query string , w/o using query dsl – arupc Nov 29 '20 at 05:14
  • @arupc you should try a solution like this one first https://stackoverflow.com/a/65021575/4604579 (i.e. use a wildcard field instead). – Val Nov 29 '20 at 05:53
  • 1
    Yes, this approach is working and I have converted all the DSL queries into Lucene queries. – arupc Dec 03 '20 at 07:46
  • @arupc glad this worked for you :) Can you please upvote and accept the answer as well – ESCoder Dec 03 '20 at 07:51