1

Fairly new to Elastic Search so may have to bare with me, I'm running into a problem where if I search for a document using 20 characters or less, the document appears, however any more characters within the same word within the query, I get no results:

  • Using 'phenoxymethylpenicillin' brings no documents.
  • Using 'phenoxymethylpenicil' brings back documents.

This is the query I'm trying to use:

{
    "match_phrase": {
        "genericNames.name": {
        "query": "phenoxymethylpenicillin",
        "slop": 15,
        "zero_terms_query": "NONE",
        "boost": 1.0
        }
    }
}

Here is the full query: https://pastebin.com/DEJvP2uS

Like I said, I'm fairly new to this, it may be a point of not looking in the correct area.

So my question is, what possible areas would cause this and why?

Thanks!

Edit: Provided is an extract from one of the documents from the sample data. I can't show a lot of it due a lot of it being sensitive, luckily the names from sample data I can share. This is from the data I'm trying to search for:

"genericNames":[
{
    "nameType":1,
    "name":"Phenoxymethylpenicillin 250mg tablets",
    "nameChangeCode":"0000",
    "nameBasisCode":"0001",
    "nameTypeDescription":"Name",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
},
{
    "nameType":5,
    "name":"Penicillin V 250mg tablets",
    "nameTypeDescription":"Alternative Name 3",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
}
],

I have also provided the index mapping as it may provide extra information:

{
    "amp": {
        "mappings": {
            "properties": {
                "_class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "ampId": {
                    "type": "long"
                },
                "amppId": {
                    "type": "long"
                },
                "attributes": {
                    "type": "nested",
                    "properties": {
                        "attributeQualifier": {
                            "type": "keyword"
                        },
                        "attributeType": {
                            "type": "integer"
                        },
                        "attributeTypeDescription": {
                            "type": "keyword"
                        },
                        "attributeValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "countryId": {
                            "type": "long"
                        },
                        "decodedValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "dictionaries": {
                    "type": "nested",
                    "properties": {
                        "abbreviation": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "description": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "dictId": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "endDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "excipients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "extractableEntry": {
                    "type": "boolean"
                },
                "genericNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "id": {
                    "type": "keyword"
                },
                "ingredients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "invalidEntry": {
                    "type": "boolean"
                },
                "pitId": {
                    "type": "integer"
                },
                "ppaCodes": {
                    "type": "nested",
                    "properties": {
                        "code": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "proprietaryNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "qpuUomCde": {
                    "type": "keyword"
                },
                "qpuVal": {
                    "type": "keyword"
                },
                "qtyUomCde": {
                    "type": "keyword"
                },
                "qtyVal": {
                    "type": "keyword"
                },
                "snomedCodes": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "snomedDescriptions": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "startDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "suppliers": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "names": {
                            "type": "nested",
                            "properties": {
                                "endDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    },
                                    "analyzer": "autocomplete_index",
                                    "search_analyzer": "autocomplete_search"
                                },
                                "nameBasisCode": {
                                    "type": "keyword"
                                },
                                "nameChangeCode": {
                                    "type": "keyword"
                                },
                                "nameType": {
                                    "type": "integer"
                                },
                                "nameTypeDescription": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "udfs": {
                    "type": "nested",
                    "properties": {
                        "ddIndicator": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "udfsUomCode": {
                            "type": "keyword"
                        },
                        "udfsValue": {
                            "type": "keyword"
                        },
                        "vmpUomCode": {
                            "type": "keyword"
                        }
                    }
                },
                "vmpId": {
                    "type": "long"
                },
                "vmppId": {
                    "type": "long"
                },
                "vtms": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                }
            }
        }
    }
}

Edit: Added link to full query - https://pastebin.com/DEJvP2uS

Edit: Settings for index:

{
    "index": {
        "max_ngram_diff": "20",
        "analysis": {
            "filter": {
                "autocomplete_suffix_filter": {
                    "type": "ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                }
            },
            "analyzer": {
                "autocomplete_index": {
                    "filter": [
                        "lowercase",
                        "autocomplete_filter",
                        "autocomplete_suffix_filter"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                },
                "autocomplete_search": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                }
            }
        },
        "number_of_replicas": "1"
    }
}
  • can you please share the sample data which you are indexing? – ESCoder Nov 13 '20 at 09:26
  • @Bhavya I have provided more information within the question. I can't show all of the sample data for various reasons but I can show you the names stored I'm searching and the mapping for the index – Jamie Briggs Nov 13 '20 at 09:57
  • @JamieBriggs, can you please provide the output of `_setting` API on your index, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html for more info. – Amit Nov 13 '20 at 10:06

2 Answers2

1

In the index mapping provided above, genericNames is of the nested type so you need to use nested query

Adding a working example using the same index data as provided above along with search query and search result.

Search Query:

{
  "query": {
    "nested": {
      "path": "genericNames",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "genericNames.name": "phenoxymethylpenicillin"
              }
            }
          ]
        }
      },
      "inner_hits":{}
    }
  }
}

Search Result:

"hits": [
                {
                  "_index": "64817981",
                  "_type": "_doc",
                  "_id": "1",
                  "_nested": {
                    "field": "genericNames",
                    "offset": 0
                  },
                  "_score": 0.7361701,
                  "_source": {
                    "nameType": 1,
                    "name": "Phenoxymethylpenicillin 250mg tablets",
                    "nameChangeCode": "0000",
                    "nameBasisCode": "0001",
                    "nameTypeDescription": "Name",
                    "startDate": "1948-01-01T00:00:00.000000+0000",
                    "endDate": "3456-02-01T00:00:00.000000+0000"
                  }
                }
              ]
ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • Ahh, I do. I'll update my question with the full query but will have to be a link due to the length – Jamie Briggs Nov 13 '20 at 10:06
  • @JamieBriggs as you are getting the search results, I don't think its the issue here, index-time tokens are not matching the search time tokens which is causing the issue – Amit Nov 13 '20 at 10:07
  • 1
    @Bhavya, please refer my answer, you will understand what could be the cause :) – Amit Nov 13 '20 at 10:11
  • Yes, @ElasticsearchNinja nested query is not the cause of OP not getting the result. It must be because of the index and search time analyzers (as you mentioned in your answer) :) – ESCoder Nov 13 '20 at 10:14
1

This must be happening due to the custom analyzer which you have on your genericNames.name field, you have different custom analyzer, index time you are using the autocomplete_index and search time autocomplete_search analyzer, but the definition of these analyzers is not provided in the question, only mapping part is provided.

Please provide the output of _setting API on your index, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html for more info.

You need to check the tokens generated for phenoxymethylpenicillin using the analyze API for both autocomplete_index and autocomplete_search analyzer and you will notice the difference.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • I have attached the settings, I'll look at the analyzers now to see the difference! – Jamie Briggs Nov 13 '20 at 10:14
  • 1
    @JamieBriggs As pointed out by ElasticsearchNinja you just need to increase the `max_gram` to at least 25 for both `autocomplete_suffix_filter` and `autocomplete_filter` in your index setting. And also set `"max_ngram_diff": "25"`. The tokens generated will have `phenoxymethylpenicillin` and your query will match on both `Phenoxymethylpenicillin` and `phenoxymethylpenicil` – ESCoder Nov 13 '20 at 10:34
  • @Bhavya Thanks both, will update later! I'm having to do another dump anyways so I've updated the analyzers and my indices are being given new dev data. Will come back to say it works or not, but after investigating, chances are it will :) – Jamie Briggs Nov 13 '20 at 11:16
  • 1
    Bingo, sorted my problem out! Thank you very much! – Jamie Briggs Nov 13 '20 at 11:33