2

How to make use of default _analyze in elastic search dsl python?

My query looks like below:

query = Q('regexp', field_name = "f04((?!z).)*")
search_obj = Search(using = conn, index = index_name, doc_type = type_name).query(query)
response = search_obj[0:count].execute()

Where do i put analyze() method so that i get to see how my "f04((?!z).)*" is getting broken into terms? Actually it seems like '!' doesn't work as regex. How do i change the anaylzer if default analyzer is unable to take '!' as regexp character?

I'm very new to use, finding little hard to accurately put analyze method in my code. PLease help.

zubug55
  • 729
  • 7
  • 27

1 Answers1

2

I'm not sure what exactly you want to achieve. If you posted a CURL query that does what you want, it would make it easier to translate it into Elasticsearch DSl or elasticsearch-py interface.

If you're looking for an alternative to _analyze method but in Python, you can achieve it using elasticsearch-py, I'm not sure you can do that using Elasticsearch DSL though. So let's say I want to see the results of how my string jestem biały miś is analyzed using my analyzer named morfologik. Using CURL I would just run:

$ curl -XGET "http://localhost:9200/morf_texts/_analyze" -H 'Content-Type: application/json' -d'
{
  "analyzer": "morfologik",
  "text": "jestem biały miś"
}'

{
  "tokens": [
    {
      "token": "być",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "biały",
      "start_offset": 7,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "miś",
      "start_offset": 13,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "misić",
      "start_offset": 13,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

In order to achieve the same result using elasticsearch-py, you can run the following:

from elasticsearch import Elasticsearch
from elasticsearch.client import IndicesClient

client = Elasticsearch()
indices_client = IndicesClient(client)

indices_client.analyze(
    body={
        "analyzer": "morfologik",
        "text": "jestem biały miś",
    }
)

The output of the analyze method is the same as of the above CURL request:

{'tokens': [{'token': 'być',
   'start_offset': 0,
   'end_offset': 6,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'biały',
   'start_offset': 7,
   'end_offset': 12,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'miś',
   'start_offset': 13,
   'end_offset': 16,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'misić',
   'start_offset': 13,
   'end_offset': 16,
   'type': '<ALPHANUM>',
   'position': 2}]}
mrapacz
  • 889
  • 8
  • 22