Im working with elasticsearch. I got collection of events, where are event names, for ex. FC Barcelona - Real Madrit
, then somewhere in collection may be Footbal Club Barcela - FC Real Madryt
.
I need to find minimum 2 hits without query text. I think aggregation and ngram tokenizer should be used here, but I'm not sure.
Here are my index settings:
{
"settings": {
"analysis": {
"analyzer": {
"test": {
"tokenizer": "test",
"filter": ["lowercase", "word_delimiter", "nGram", "porter_stem"]
"token_chars": [
"letter",
"digit",
"whitespace"
]
}
},
"tokenizer": {
"test": {
"type": "ngram",
"min_gram": 3,
"max_gram": 15,
}
}
}
}
}
And that's how my current query looks like:
{
"size": 0,
"aggs": {
"duplicateNames": {
"terms": {
"field": "eventName",
"min_doc_count": 2
},
"aggs": {
"duplicateDocuments": {
"top_hits": {}
}
}
}
}
}
And here is my mapping:
{
"event": {
"properties": {
"eventName": {
"type": "keyword",
// fielddata: true
}
}
}
}
Could u point me in the right direction, please?