2

I want to write a query to analyze one or more fields ?

i.e. current analyzers require text to function, instead of passing text I want to pass a field value.

If I have a document like this

{
    "desc": "A document description",
    "name": "This name is not original",
    "amount": 3000
}

I would like to return something like the below

{
    "desc": ["document", "description"],
    "name": ["name", "original"],
    "amount": 3000
}
Ayman
  • 1,387
  • 4
  • 20
  • 35
  • Could you explain better this: current analyzers require text to function , instead of passing text I want to pass a field value. A field value is, generally, a text, so your question is not clear and understandable – Lupanoide Apr 19 '18 at 09:08
  • @Lupanoide What I meant is, https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html, to Analyze something you need to pass a `text` property. You need to know the content of the text to pass. Now instead of something static, I need to pass something from a field. – Ayman Apr 19 '18 at 09:10
  • you are wrong: to query against the _analyze api you need to pass a text property. This api serves to understand better the behaviour of an analyzer. If you create an analyzer in an index, then all the field values mapped with that analyzer will be analyzed in that way. ES automatically analyzes field values. _analyze api serves only to test your analyzer – Lupanoide Apr 19 '18 at 09:16
  • Ok, I get your point; but what if I want to get the keywords just once, or in a certain conditions but not every time I query the data. – Ayman Apr 19 '18 at 09:39
  • cool, if all fails, I'll try to use that workaround. – Ayman Apr 19 '18 at 09:50

1 Answers1

4

You can use Term Vectors or Multi Term Vectors to achieve what you're looking for:

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html

You'd have to specify the Ids of the fields you want as well as the fields and it will return an array of analyzed tokens for each document you have as well as certain other info which you can easily disable.

GET /exampleindex/_doc/_mtermvectors
{
  "ids": [
    "1","2"
  ],
  "parameters": {
    "fields": [
      "*"
    ]
  }
}

Will return something along the lines of:

"docs": [
    {
      "_index": "exampleindex",
      "_type": "_doc",
      "_id": "1",
      "_version": 2,
      "found": true,
      "took": 0,
      "term_vectors": {
        "desc": {
          "field_statistics": {
            "sum_doc_freq": 5,
            "doc_count": 2,
            "sum_ttf": 5
          },
          "terms": {
            "amazing": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 1,
                  "start_offset": 3,
                  "end_offset": 10
                }
              ]
            },
            "an": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 2
                }
              ]
            }
MRizwan33
  • 2,723
  • 6
  • 31
  • 42