1

I tried two different approaches for creating index and both are returning anything if I search for part o the word. Basically, if I search for first letters or letters in the middle of the word I want get all the documents.

FIRST TENTATIVE BY CREATING INDEX THAT WAY (other stackoverflow question a bit old):

POST correntistas/correntista
{
  "index": {
    "index": "correntistas",
    "type": "correntista",
    "analysis": {
      "index_analyzer": {
        "my_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "mynGram"
          ]
        }
      },
      "search_analyzer": {
        "my_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "mynGram"
          ]
        }
      },
      "filter": {
        "mynGram": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 50
        }
      }
    }
  }
}

SECOND TENTATIVE BY CREATING INDEX THAT WAY (other recent stackoverflow question)

PUT /correntistas
{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete_search": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                },
                "autocomplete_index": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "nome": {
                "type": "text",
                "analyzer": "autocomplete_index",
                "search_analyzer": "autocomplete_search"
            }
        }
    }
}

This second tentative failed with

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Root mapping definition has unsupported parameters:  [nome : {search_analyzer=autocomplete_search, analyzer=autocomplete_index, type=text}]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [properties]: Root mapping definition has unsupported parameters:  [nome : {search_analyzer=autocomplete_search, analyzer=autocomplete_index, type=text}]",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "Root mapping definition has unsupported parameters:  [nome : {search_analyzer=autocomplete_search, analyzer=autocomplete_index, type=text}]"
    }
  },
  "status": 400
}

Kibana Print Screen

Although the first way I created the index the index was created without exception, it doesn't work when I type part of the properties "nome".

I added one document this way

POST /correntistas/correntista/1
    {
        "conta": "1234",
        "sobrenome": "Carvalho1",
        "nome": "Demetrio1"
    }

Now I want to retrieve the above document either by typing first letters (eg. De) or typing part of the word from middle (eg met). But none of the two ways bellow I am searching is retrieving the document

Simple way to query:

GET correntistas/correntista/_search
{
    "query": {
        "match": {
            "nome": {
                "query": "De" #### "met" should I also work from my perspective
            }
        }
    }
}

More elaborated way to query also failling

GET correntistas/correntista/_search
{
    "query": {
        "match": {
            "nome": {
                "query": "De",  #### "met" should I also work from my perspective
                "operator": "OR",
                "prefix_length": 0,
                "max_expansions": 50,
                "fuzzy_transpositions": true,
                "lenient": false,
                "zero_terms_query": "NONE",
                "auto_generate_synonyms_phrase_query": true,
                "boost": 1
            }
        }
    }
}

I don't think is relevant but here are the verions (I am using this version because it is intended to work in production with spring-data and there is some "delay" on adding Elasticsearch newer versions in Spring-data)

elasticsearch and kibana 6.8.4

PS.: please don't suggest me to use regular expression neither wilcards (*).

*** Edited

All steps below were done in Console - Kibana/Dev Tools

Step 1:

POST /correntistas/correntista
{
  "settings": {
    "index.max_ngram_diff" :10,
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "ngram", 
          "min_gram": 2,
          "max_gram": 8
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Results on right panel:

#! Deprecation: the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template
{
  "_index" : "correntistas",
  "_type" : "correntista",
  "_id" : "alrO-3EBU5lMnLQrXlwB",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Step 2:

POST /correntistas/correntista/1
{
    "title" : "Demetrio1"
}

Results on right panel:

{
  "_index" : "correntistas",
  "_type" : "correntista",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Step 3:

GET correntistas/_search
{
    "query" :{
        "match" :{
            "title" :"met"
        }
    }
}

Results on right panel:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

In case it is relevant:

Added document type on get url

GET correntistas/correntista/_search
{
    "query" :{
        "match" :{
            "title" :"met"
        }
    }
}

Also brings nothing:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Searching with entire title text

GET correntistas/_search
{
    "query" :{
        "match" :{
            "title" :"Demetrio1"
        }
    }
}

Brings the document:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "correntistas",
        "_type" : "correntista",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "title" : "Demetrio1"
        }
      }
    ]
  }
}

Looking at the index it is interested not see the analyser:

GET /correntistas/_settings

Result on right panel

{
  "correntistas" : {
    "settings" : {
      "index" : {
        "creation_date" : "1589067537651",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "jm8Kof16TAW7843YkaqWYQ",
        "version" : {
          "created" : "6080499"
        },
        "provided_name" : "correntistas"
      }
    }
  }
}

How I run Elasticsearch and Kibana

docker network create eknetwork

docker run -d --name elasticsearch --net eknetwork -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.4

docker run -d --name kibana --net eknetwork -p 5601:5601 kibana:6.8.4
Jim C
  • 3,957
  • 25
  • 85
  • 162

1 Answers1

1

In my this SO answer, the requirement was kinda prefixed search, ie for text Demetrio1 only searching for de demet required, which worked as I created edge-ngram tokenizer to address this, but in this question, requirement is to provide the infix search for which we will use the ngram tokenizer in our custom analyzer.

Below is the step by step example

Index def

{
  "settings": {
    "index.max_ngram_diff" :10,
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "ngram",  --> note this
          "min_gram": 2,
          "max_gram": 8
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Index sample doc

{
    "title" : "Demetrio1"
}

Search query

{
    "query" :{
        "match" :{
            "title" :"met"
        }
    }
}

search result bring the sample doc:)

 "hits": [
            {
                "_index": "ngram",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.47766083,
                "_source": {
                    "title": "Demetrio1"
                }
            }
        ]
Amit
  • 30,756
  • 6
  • 57
  • 88
  • thanks for you help. Unfortunatelly it is not working. How do you create the index? Did you post or put it? If I try put it complains type": "mapper_parsing_exception", "reason": "Root mapping definition has unsupported parameters: [title : {search_analyzer=standard, analyzer=autocomplete, type=text}]" – Jim C May 09 '20 at 23:52
  • Please, let me know the exact Elasticsearch version you are using. Also, if possible, can you add some full pictures how you are submitting the commands? Are you using Kibana DevTools, Postman or curl? Maybe some extra package added in my case is blocking me from PUT new index with mapping. I do prefer 6.8 because Spring-Data compatibility but first step first. Let me get it work and then I will see what to do with Spring-Data. – Jim C May 10 '20 at 00:07
  • Also I am running both Elasticsearch and Kibana from docker but I hard can imagine any issue on it. I added above how I am starting throw docker run. – Jim C May 10 '20 at 00:10
  • 1
    I tried with 7.X but shouldn't matter as there are no breaking changes on these API, and yes you are correct running it in Docker also doesn't make any diff, I would share my postman collection with you in sometime – Amit May 10 '20 at 01:12
  • 1
    It seems there is some significant difference between version 6.x and 7.x https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html. Does it ring a bell to you? Copied from this link "... Types are deprecated in APIs in 7.0, with breaking changes to the index creation, put mapping, get mapping, put template, get template and get field mappings APIs..." I will give a try with latest version and ignore for while Spring-data – Jim C May 10 '20 at 02:02
  • 1
    I got it worked after few changes but I very confused. Firstly I updated to Elasticsearch latest version 7.6.2. Then, instead of POST index I PUT it. Then instead of POST a new document with url http://192.168.99.100:9200/correntistas/correntista/1 I did with http://192.168.99.100:9200/correntistas/_doc/1 (note I changed from my desired document label "correntista" to "_doc". Then I GET with http://192.168.99.100:9200/correntistas/_doc/_search. What is the difference between PUT and POST an index and why _doc works and "correntista" not (the index name is "correntistas" plural of "correntista" – Jim C May 10 '20 at 02:30
  • 1
    @JimC from ES 7.X types are removed , please see https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html for more info which explains why `_doc` work and why your `correntista` doesn't work – Amit May 10 '20 at 02:33
  • 1
    @JimC, yeah there are quite some breaking changes and implicit type `_doc` will also be removed it ES 8.X(soon to be released) , it is kept to just have backward compatibility. also yeah in 7.X create index works with PUT not POST as you can update it later on, I would suggest you keep updating your version which will make later migration much easier :) – Amit May 10 '20 at 02:36
  • You have answered my question. A last comment if possible, does it mean I am obligated to name my document and I must use _doc? And if I want two different documents? Must I create two different indexes? Do you recomend I don't use even _doc since it will be removed completetly? – Jim C May 10 '20 at 02:47
  • 1
    @JimC, `_doc` is temp replacement for types in Elasticsearch index, which is kept only for backward compatibility and yes if you want to create two different types of documents, you need to create two indexes. and in ES7.X `_doc` is internal and mandatory but in ES8 it would be removed completly and removing them shouldn't be difficult when ES publish upgrade instructions. – Amit May 10 '20 at 05:30