I have tried both and they seem to produce the same results when I test the analyzers
settings: {
analysis: {
filter: {
ngram_filter: {
type: "ngram",
min_gram: 2,
max_gram: 20
}
},
tokenizer: {
ngram_tokenizer: {
type: "ngram",
min_gram: 2,
max_gram: 20
}
},
analyzer: {
index_ngram: {
type: "custom",
tokenizer: "keyword",
filter: [ "ngram_filter", "lowercase" ]
},
index_ngram2: {
type: "custom",
tokenizer: "ngram_tokenizer",
filter: [ "lowercase" ]
},
},
}
}
I get the same results doing:
curl -X GET "localhost:9200/my_index/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "index_ngram",
"text": "P&G 40-Bh"
}
'
and
curl -X GET "localhost:9200/my_index/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "index_ngram2",
"text": "P&G 40-Bh"
}
'
Which one should I use? Is there a performance difference? The it looks like they just do the operations in a different order but I'm not sure which is more performant, or what is better convention.