0

I'm running a query on elasticsearch using function_score where the value of a nested field is used for the calculation (price in this case). Which of the following is a better way to index the data?

{
      "name": "John",
      "age": 50,
      "country": "US",
      "subscription": {
        "Plan1": {
          "price": 100,
          "date": "June 5th"
        },
        "Plan2": {
          "price": 50,
          "date": "June 6th"
        }
      }
    }

OR

{
  "name": "John",
  "age": 50,
  "country": "US",
  "subscription": [
    {
      "name": "Plan1",
      "price": 100,
      "date": "June 5th"
    },
    {
      "name": "Plan2"
      "price": 50,
      "date": "June 6th"
    }
  ]
}

The query would be filtering on the "plan name" and "price", and "price" will be used for score calculation. The number of plans maybe upwards of 20.

Edit 1: Sample query for approach 2

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "createdatutc": {
                  "gte": "2022-11-01T00:00:00.000Z",
                  "lt": "2023-05-06T00:00:00.000Z",
                  "format": "strict_date_optional_time"
                }
              }
            },
            {
              "terms": {
                "country": [
                  "US"
                ]
              }
            },
            {
              "term": {
                "subscription.name": {
                  "value": "Plan1"
                } 
              }
            }
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "query_string": {
              "default_field": "name",
              "query": "\"john\""
            }
          },
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  }
}
HarryClifton
  • 129
  • 1
  • 7

1 Answers1

1

It depends on how many plans you have. If it's just two or a handful then the first option is better, otherwise you need to make subscription a nested object and nested object are less optimal in terms of query performance.

With the first option, filtering on the plan name and price can be done with a single condition on subscription.Plan1.price: 100, while with the second option, you need two conditions (hence why subscription needs to be nested), one on subscription.name: Plan1 and another on subscription.price: 100

UPDATE 1: using option 1

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "createdatutc": {
                  "gte": "2022-11-01T00:00:00.000Z",
                  "lt": "2023-05-06T00:00:00.000Z",
                  "format": "strict_date_optional_time"
                }
              }
            },
            {
              "terms": {
                "country": [
                  "US"
                ]
              }
            },
            {
              "exists": {
                "field": "subscription.Plan1.price"
              }
            }
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "query_string": {
              "default_field": "name",
              "query": "\"john\""
            }
          },
          "field_value_factor": {
            "field": "subscription.Plan1.price",
            "factor": 1.2
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • thanks for your answer. I've indexed a few documents using the second approach, and edited my question with the sample query. Is there a way to optimise the script component, where I need the price for the matched plan? My current script seems to be intensive. – HarryClifton Jun 26 '23 at 08:02
  • 1
    If you took the second approach, `subscription` must be nested, but it doesn't seem to be the case according to your query. See my updated answer on how to do it with the first option AND without needing a script. – Val Jun 26 '23 at 08:17