ES Search partial word - ngram?

Question

I am using Elastic Search to index entities that contain two fields: agencyName and agencyAddress.

Let's say I have indexed one entity:

{
    "agencyName": "Turismo Viajes",
    "agencyAddress": "Av. Maipú 500"
}

I would like to be able to search for this entity and get the entity above searching through the agencyName. Different searches could be:

1) urismo 2) Viaje 3) Viajes 4) Turismo 5) uris

The idea is that if I query with those strings I should always get that entity (probably with different score depending on how accurate it is).

For this I thought that nGram would work out, so I defined a global analyzer in my elastic search.yml file called phrase.

index:
  analysis:
    analyzer:
      phrase:
        type: custom
        tokenizer: nGram
        filter: [nGram, lowercase, asciifolding]

And I created the agency index like this:

{
  "possible_clients" : {
    "possible_client" : {
      "properties" : {
        "agencyName" : {
          "type" : "string",
          "analyzer" : "phrase"
        },
        "agencyAddress" : {
          "type": "string"
        }
}

The problem is that when making a call like this:

curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
    "query": { "term": { "agencyName": "uris" }}
}'

I don't get any hits. Any ideas what I am doing wrong?

Thanks in advance.

score 1 · Answer 1 · answered Jul 31 '14 at 16:36

1

You are using a term query for searching. A term query is always unanalysed. So changing the analyser will not have any effect. You should use for example a match query.

answered Jul 31 '14 at 16:36

Saskia Vola

191
2
8

score 0 · Answer 2 · answered Sep 27 '13 at 07:17

According to the docs, the default value of the max_gram of your tokenizer is 2. So, you index tu, ur, ri, is, sm, mo , etc etc.
The term filter does not analyze your input, so, you are searching for uris, and uris was never indexed.

Try to set a max_gram. :

ngram tokenizer ngram tokenfilter

And maybe you should not use both the ngram tokenizer and the ngram filter. I always used just the filter. (because the tokenizer was the whitespace)

here is a edgengram filter we had to define here. Ngrams should work just the same.

"filter" : {    
"my_filter" : {
    "type" : "edgeNGram",
    "min_gram" : "1",
    "max_gram" : "20"
}
}

Hope it helps.

Thank you, I'll test it as soon as I can and let you know :) — Agustin Lopez, Sep 27 '13 at 18:46

ES Search partial word - ngram?

2 Answers2