Elasticsearch: search_as_you_type datatype vs. tokenizer edge_ngram

Question

What is the difference between new search_as_you_type datatype in Elasticsearch and tokenizer type edge_ngram? Which one to prefer in building search-as-you-type search engine?

Documentation of Elasticsearch gives both implementations:

search_as_you_type datatype: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html

tokenizer type edge_ngram: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html (Look at the example of how to set up a field for search-as-you-type.)

UPDATE

Elasticsearch version : 7.6.1

I indexed my data with a data type search_as_you_type according to the latest Elasticsearch documentation and trying to build a simple query via Java API based on the example below:

GET my_index/_search
{
  "query": {
    "multi_match": {
      "query": "brown f",
      "type": "bool_prefix",
      "fields": [
        "my_field",
        "my_field._2gram",
        "my_field._3gram"
      ]
    }
  }
}

The point that I struggle with is adding "type": "bool_prefix".

A) I tried with MultiMatchQueryBuilder

MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MatchQuery.Type.BOOLEAN_PREFIX);

and got an exception at the second line of above code:

org.elasticsearch.ElasticsearchParseException: failed to parse [multi_match] query type [boolean_prefix]. unknown type.

B) Then I tried with MatchBoolPrefixQueryBuilder

MatchBoolPrefixQueryBuilder matchBoolPrefixQueryBuilder=new MatchBoolPrefixQueryBuilder(value, fields);

got an exception

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=parsing_exception, reason=[match_bool_prefix] unknown token [START_ARRAY] after [query]]
...
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/my_dictionary/_search?pre_filter_shard_size=128&typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57}],"type":"parsing_exception","reason":"[match_bool_prefix] unknown token [START_ARRAY] after [query]","line":1,"col":57},"status":400}

at line

SearchResponse searchResponse=restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

What am I doing wrong? Which one should I use and how?

SOLUTION

I solved the issue just by changing the type to:

MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type("bool_prefix");

But I don't understand why the type must be hardcoded as "bool_prefix" instead of using MatchQuery.Type.BOOLEAN_PREFIXor why not possible to use MatchBoolPrefixQueryBuilder, there is no much implementation examples of this query.

Val · Accepted Answer · 2020-06-24T10:42:17.313

1

The two are different things.

edge_ngram is a tokenizer, which means it kicks in at indexing time to tokenize your input data. There is also a edge_ngram token filter. Both are similar but work at different levels. See this thread to learn about the main differences.

search_as_you_type is a field type which contains a few sub-fields, one of which is called _index_prefix and which leverages the edge_ngram tokenizer.

So basically, what you see in the edge_ngram tokenizer documentation has actually been leveraged when they decided to add the new search_as_you_type field type.

UPDATE

You actually need to use

MultiMatchQueryBuilder multiMatchQueryBuilder=new MultiMatchQueryBuilder(value, fields);
multiMatchQueryBuilder.type(MultiMatchQueryBuilder.Type.BOOL_PREFIX);

You can see here how that enumeration value is built

edited Jun 24 '20 at 10:42

answered Jun 19 '20 at 09:20

Val

207,596
13
358
360

Thank you for your reply. I indexed my data with ***search as you type* data type and trying to query using Java API with **multi_match** query of type **bool_prefix** as it is stated in the documentation. I used **MultiMatchQueryBuilder** and added boolean prefix as _multiMatchQueryBuilder.type(MatchQuery.Type.BOOLEAN_PREFIX);_ which gave an exception, then I tried **MatchBoolPrefixQueryBuilder** got different exception [type=parsing_exception, reason=[match_bool_prefix] unknown token [START_ARRAY] after [query]]_. My query works fine with Kibana. What am I doing wrong?Which one should I use? – PARO Jun 24 '20 at 08:25
Maybe you should update your question with the queries you tried, the errors you get and the query you tried in Kibana and which worked. – Val Jun 24 '20 at 08:26
Which version of ES are you using? multi match `bool_prefix` came out in 7.2 and wasn't available before, which explains the `unknown type` you're getting – Val Jun 24 '20 at 08:52
I'm using Elasticsearch-7.6.1 – PARO Jun 24 '20 at 08:59
I added how I solved the issue, but I don't understand why the solution has to be that way. – PARO Jun 24 '20 at 10:15

Elasticsearch: search_as_you_type datatype vs. tokenizer edge_ngram

1 Answers1