I want to create suggestions on how to complete a term based on tokens, similar to google like autocomplete but only with one token or word.
I'd like to search across filenames who will be tokenized. E.g. "BRAND_Connect_A1233.jpg" gets tokenized into "brand", "connect", "a1234" and "jpg".
Now I'd like to ask for some suggestion for e.g. "Con". The suggestion should deliver the complete matching tokens, not the full filename:
- Connect
- Contour
- Concept
- ...
The suggestion for "A12" should be "A1234", "A1233", "A1233" ...
Example
Working with queries, facets and filters works fine.
First I created a mapping including a tokenizer and a filter:
curl -XPUT 'localhost:9200/files/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"filename_search" : {
"tokenizer" : "filename",
"filter" : ["lowercase"]
},
"filename_index" : {
"tokenizer" : "filename",
"filter" : ["lowercase","edge_ngram"]
}
},
"tokenizer" : {
"filename" : {
"pattern" : "[^[;_\\.\\/]\\d]+",
"type" : "pattern"
}
},
"filter" : {
"edge_ngram" : {
"side" : "front",
"max_gram" : 20,
"min_gram" : 2,
"type" : "edgeNGram"
}
}
}
},
"mappings" : {
"file" : {
"properties" : {
"filename" : {
"type" : "string",
"search_analyzer" : "filename_search",
"index_analyzer" : "filename_index"
}
}
}
}
}'
Both analyzers work pretty well:
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'
Now I added some example data
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg"}'
curl -X POST "localhost:9200/files/_refresh"
Various approaches to get the desired suggestion does not deliver the expected results. I had tried to name the analyzers and tried various combinations of analyzers and wildcards.
curl -XGET 'localhost:9200/files/_suggest?pretty=true' -d '{
"text" : "con",
"simple_phrase" : {
"phrase" : {
"field" : "filename",
"size" : 15,
"real_word_error_likelihood" : 0.75,
"max_errors" : 0.1,
"gram_size" : 3
}
}
}'
curl -XGET 'localhost:9200/files/_suggest?pretty=true' -d '{
"my-suggestion" : {
"text" : "con",
"term" : {
"field" : "filename",
"analyzer": "filename_index"
}
}
}'