0

I am sorting results from Elasticsearch (5.1.1) based on a calculation of values in nested key/value structures.

The sorting has to:

  1. find values from given keys across multiple nested structures
  2. multiply those values with one another
  3. use this multiplication as a score for sorting

What I have currently is working, but it's really slow/inefficient. I have made a Painless script, due to the calculations in #2 above. What I am doing is:

  1. loop through all keys to find their respective matching value
  2. for the first match, save the value in a variable; for subsequent matches, multiply saved value with current value, and save that in the aforementioned variable

I think the inefficiency is due to:

  1. looping though all nested items (there are a lot per document, and many documents)
  2. I am using params['_source'], which has a reputation of slowing down things. AFAIK, I have to use params['_source'] to address nested values in Painless

Now to the question: how can I solve this problem more efficiently? Am I going about this the wrong way entirely, or is there a way to not use params['_source']?

My mapping (the nested structure is "my_ratios"):

{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "a_value": {
            "type": "long"
          },
          "my_ratios": {
            "type": "nested",
            "properties": {
              "Key": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "Value": {
                "type": "float"
              }
            }
          }
        }
      }
    }
  }
}

Example of nested key/value structure:

{
  {
    "Key": "Key1",
    "Value": 0.4898
  },
  {
    "Key": "Key2",
    "Value": 0.14286
  },
  {
    "Key": "Key3",
    "Value": 6.12245
  },
  ...
}
Scarabas
  • 103
  • 1
  • 1
  • 7

1 Answers1

1

I'm afraid your only option is to either remodel your data, or have a copy of the relevant data structures just for your sorting.

As far as I know, Elasticsearch was never intended to be efficient on params['_source'], and you do need - as you pointed out - to use this to access nested objects from Painless. In other words - Elasticsearch is not efficient when doing custom ops on nested objects.

Make sure that you're exiting the loops as soon as you have satisfied your requirements, to avoid unnecessary iterations - if you haven't already done this, this could bring about some improvements.

The_Torst
  • 401
  • 3
  • 10