2

I'm building blog-like app with flask (based on Miguel Grinberg Megatutorial) and I'm trying to setup ES indexing that would support autocomplete feature. I'm struggling with setting up indexing correctly.

I started with (working) simple indexing mechanism:

from flask import current_app

def add_to_index(index, model):
    if not current_app.elasticsearch:
        return
    payload = {}
    for field in model.__searchable__:
        payload[field] = getattr(model, field)
    current_app.elasticsearch.index(index=index, id=model.id, body=payload)

and after some fun with Google I found out that my body could look something like that (probably with fewer analyzers, but I'm coping exactly as I found it somewhere, where author claims it works):

{
 "settings": {
"index": {
  "analysis": {
    "filter": {},
    "analyzer": {
      "keyword_analyzer": {
        "filter": [
          "lowercase",
          "asciifolding",
          "trim"
        ],
        "char_filter": [],
        "type": "custom",
        "tokenizer": "keyword"
      },
      "edge_ngram_analyzer": {
        "filter": [
          "lowercase"
        ],
        "tokenizer": "edge_ngram_tokenizer"
      },
      "edge_ngram_search_analyzer": {
        "tokenizer": "lowercase"
      }
    },
    "tokenizer": {
      "edge_ngram_tokenizer": {
        "type": "edge_ngram",
        "min_gram": 2,
        "max_gram": 5,
        "token_chars": [
          "letter"
        ]
      }
    }
  }
}
 },
"mappings": {
field: {
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keywordstring": {
          "type": "text",
          "analyzer": "keyword_analyzer"
        },
        "edgengram": {
          "type": "text",
          "analyzer": "edge_ngram_analyzer",
          "search_analyzer": "edge_ngram_search_analyzer"
        },
        "completion": {
          "type": "completion"
        }
      },
      "analyzer": "standard"
    }
     }
   }
 }
}

I figured out that I can modify original mechanism to something like:

    for field in model.__searchable__:
    temp = getattr(model, field)
    fields[field] = {"properties": {
      "type": "text",
      "fields": {
        "keywordstring": {
          "type": "text",
          "analyzer": "keyword_analyzer"
        },
        "edgengram": {
          "type": "text",
          "analyzer": "edge_ngram_analyzer",
          "search_analyzer": "edge_ngram_search_analyzer"
        },
        "completion": {
          "type": "completion"
        }
      },
      "analyzer": "standard"
    }}
payload = {
    "settings": {
        "index": {
          "analysis": {
            "filter": {},
            "analyzer": {
              "keyword_analyzer": {
                "filter": [
                  "lowercase",
                  "asciifolding",
                  "trim"
                ],
                "char_filter": [],
                "type": "custom",
                "tokenizer": "keyword"
              },
              "edge_ngram_analyzer": {
                "filter": [
                  "lowercase"
                ],
                "tokenizer": "edge_ngram_tokenizer"
              },
              "edge_ngram_search_analyzer": {
                "tokenizer": "lowercase"
              }
            },
            "tokenizer": {
              "edge_ngram_tokenizer": {
                "type": "edge_ngram",
                "min_gram": 2,
                "max_gram": 5,
                "token_chars": [
                  "letter"
                ]
              }
            }
          }
        }
    },
    "mappings": fields
}

but that's where I'm lost. Where do I put actual content (temp=getattr(model, field)) in this document so that whole thing works? I couldn't find any example or relevant part of documentation that would cover updating index with slightly more complex mappings and so on, is this even correct/doable? Every guide I see covers bulk indexing and somehow I fail to make connection.

Jakub Królikowski
  • 403
  • 1
  • 7
  • 16

1 Answers1

1

I think you are a little bite confuse let me try to explain. What you want is adding one document in elastic with:

current_app.elasticsearch.index(index=index, id=model.id, body=payload)

Which is using the index() method defined in the elasticsearch-py lib Check the example here: https://elasticsearch-py.readthedocs.io/en/master/index.html#example-usage body must be your document a simple dict, as shown in the example from the doc.

What you set is the settings of the index which is different. Take the analogy of the database, you set the schema of a table inside the document.

To set the settings if you want to set the given settings you need to use put_settings, as defined here: https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=settings#elasticsearch.client.ClusterClient.put_settings

I hope it help you.

Gabriel
  • 192
  • 8
  • I am indeed confused quite a bit ; ) This is my first contact with ES, so I may be missing most obvious things. If I understand correctly, I'd put everything that's under 'settings' in put_settings. Where does the mapping go? Is it part of the settings (and I add one document using keys from mapping?) or is it part of the document (and I insert specific values... where?)? – Jakub Królikowski Oct 11 '19 at 07:32
  • "indexing that would support autocomplete feature." <--- you'd better to check a specific tutorial elastic is not so simple there's a lot of concept to understand before. But about your question to set the settings yes is put_settings and for mapping is put_mapping you can get more example from the documentation. https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=mapping#elasticsearch.client.IndicesClient.put_mapping – Gabriel Oct 11 '19 at 07:54
  • Oh yeah, I'm aware that there'll be significantly more to do and not just with pure ES. Still, I need to start somewhere. Thanks, I'll try to follow this. – Jakub Królikowski Oct 11 '19 at 08:00