3

I recently upgraded from Elasticsearch 1.4 to 5.4 and I'm struggling to migrate my autocomplete queries efficiently. The problem is that I want to have a completion suggester where the output is different from the input.

The documents I store have a field for categories which is basically an array of strings with their URIs (because they form a tree). The last part of the URI, I call it label, is the input in the completion suggester, but as a response I would like to retrieve the full URI.

So let's say I have two documents:

{
    "name" : "Lord of The Rings",
    "categories" : ["Books/Genre/Fantasy", "Books/Language/English"]
}

and

{
    "name" : "Game of Thrones",
    "categories" : ["Series/Genre/Fantasy", "Series/Host/HBO"]
}

My input is "Fant" and I want to get as a response the URIs for the "Series/Genre/Fantasy" and "Books/Genre/Fantasy" categories.

Previously with ES 1.4, I was able to create a completion suggester with a different output for a given input, so I indexed my suggesters like this:

{
    "suggest" : {
        "input": [ "Fantasy"],
        "output": "Series/Genre/Fantasy"
    }
}

and

{
    "suggest" : {
        "input": [ "Fantasy"],
        "output": "Books/Genre/Fantasy"
    }
}

But in ES 5.4, the output property doesn't exist anymore for completion suggesters so all I get in the response is the input property of my suggest field, which is the label "Fantasy", but I want the URI.

Right now, my workaround is to look for the categories field of each document returned in the _source property of the response, and filter on the categories that have a label starting with the input "Fant". It is very inefficient since I need to map every category of every returned document into its label to check with the input.

Isn't there a more efficient way to do that with ES suggesters? What am I missing?

WhiteFangs
  • 664
  • 10
  • 18

1 Answers1

5

Elasticsearch's completion suggester have been changed from 5.0. The support for specifying output when indexing suggestion entries has been removed. Now suggestion result entry’s text is always the un-analyzed value of the suggestion’s input (same as not specifying output while indexing suggestions in pre-5.0 indices). So you need to add output as a sibling field of suggest key in the body.
Here's how it should look like:

Mapping:

{
    "mappings": {
        "<type>" : {
            "properties" : {
                "suggest" : {
                    "type" : "completion"
                },
                "output" : {
                    "type": "keyword"
                }
            } 
        }
    }
}

Don't forget to replace <type> with your index type.

Indexing:

/<index_name>/<type_name>

{
    "suggest" : {
        "input": ["Fantasy"],
        "weight" : 1
    },
    "output": "Series/Genre/Fantasy"
}

Here, the field name output can be replaced by anything, it's just meta-data of your document.

Query:

/<index_name>/_search

{
    "suggest": {
        "show-suggest" : {
            "prefix" : "Fant",
            "completion" : {
                "field" : "suggest"
            }
        }
    }
}

I hope this helps.

iVetal
  • 125
  • 5
mayankchutani
  • 273
  • 3
  • 14
  • This doesn't solve my problem because I end up using the same kind of workaround as I do now: look in the `_source` of every document in the response and get the information from the metadata by checking with the input. Your answer provides an optimization to my current method because I would add the output next to the suggest field and get it back by filtering on suggest field that were matched, but it still seems quite inefficient to me... I would like to have the output in the `options` response and not have to look for it in the `_source` of every document. But I'm not sure it's possible. – WhiteFangs Aug 16 '17 at 15:15
  • 1
    I'd say you design your index in such a way that you get our desired output at the first index of the result array. So if you have multiple outputs right now, split them into multiple indexes so that you always get the best match on the first index of the highest ranked document. I hope this answer is closest enough to your question, I'd appreciate if you accept this as a response to your question. Thanks. – mayankchutani Aug 20 '17 at 08:55
  • I think you're right and that the old way to get outputs (I'd say the ideal solution in my specific case) doesn't exist anymore. – WhiteFangs Aug 21 '17 at 09:48
  • Thanks. Glad I was able to help. – mayankchutani Aug 21 '17 at 14:19