5

I want to create suggestions on how to complete a term based on tokens, similar to google like autocomplete but only with one token or word.

I'd like to search across filenames who will be tokenized. E.g. "BRAND_Connect_A1233.jpg" gets tokenized into "brand", "connect", "a1234" and "jpg".

Now I'd like to ask for some suggestion for e.g. "Con". The suggestion should deliver the complete matching tokens, not the full filename:

  • Connect
  • Contour
  • Concept
  • ...

The suggestion for "A12" should be "A1234", "A1233", "A1233" ...

Example

Working with queries, facets and filters works fine.

First I created a mapping including a tokenizer and a filter:

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "search_analyzer" : "filename_search",
               "index_analyzer" : "filename_index"
            }
         }
      }
   }
}'

Both analyzers work pretty well:

curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'

Now I added some example data

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg"}'
curl -X POST "localhost:9200/files/_refresh"

Various approaches to get the desired suggestion does not deliver the expected results. I had tried to name the analyzers and tried various combinations of analyzers and wildcards.

curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "text" : "con",
    "simple_phrase" : {
      "phrase" : {
        "field" : "filename",
        "size" : 15,
        "real_word_error_likelihood" : 0.75,
        "max_errors" : 0.1,
        "gram_size" : 3
      }
    }
}'
curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "my-suggestion" : {
    "text" : "con",
    "term" : {
        "field" : "filename",
        "analyzer": "filename_index"
        }
    }
}'

1 Answers1

0

You need to add a special mapping to use the completion suggester, as documented in the official ElasticSearch docs. I've modified your example to show how it works.

First create the index. Note the filename_suggest mapping.

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "analyzer": "filename_index",
               "search_analyzer" : "filename_search"
            },
            "filename_suggest": {
              "type": "completion",
              "analyzer": "simple",
              "search_analyzer": "simple",
              "payloads": true
            }
         }
      }
   }
}'

Add some data. Note how the filename_suggest has the input field, which contains the keywords to match on.

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg", "filename_suggest": { "input": ["BRAND", "ConnectBlue", "A1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg", "filename_suggest": { "input": ["BRAND", "Connect", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg", "filename_suggest": { "input": ["BRAND", "ConceptSpace", "A1244", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg", "filename_suggest": { "input": ["COMPANY", "Connect", "A1222", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg", "filename_suggest": { "input": ["COMPANY", "Concept", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg", "filename_suggest": { "input": ["DEALER", "Connect", "B1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg", "filename_suggest": { "input": ["DEALER", "Contour21", "B1233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg", "filename_suggest": { "input": ["DEALER", "ConceptCube", "B2233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/_refresh"

Now perform the query:

curl -XPOST 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "filename_suggest" : {
        "text" : "con",
        "completion": {
            "field": "filename_suggest", "size": 10
        }
    }
}'

Results:

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "filename_suggest" : [ {
    "text" : "con",
    "offset" : 0,
    "length" : 3,
    "options" : [ {
      "text" : "Connect",
      "score" : 2.0,
      "payload":{}
    }, {
      "text" : "Concept",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConceptSpace",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConnectBlue",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "Contour21",
      "score" : 1.0,
      "payload":{}
    } ]
  } ]
}
krasnaya
  • 2,995
  • 3
  • 21
  • 19