0

I'm trying to create a query that returns documents based on a calculation that is best compared to a SumIf in Excel.

A very simplified version of my mapping is:

{
  "item": {
    "properties": {
      "name": {
        "type": "text"
      },
      "data": {
        "properties": {
          "inventory": {
            "properties": {
              "locations": {
                "type": "nested",
                "properties": {
                  "state": {
                    "type": "keyword"
                  },
                  "amount": {
                    "type": "double"
                  }
                }
              },
              "total": {
                "type": "double"
              }
            }
          }
        }
      }
    }
  }
}

(This data wouldn't make much sense in a real system, but I can't post the actual mapping.)

Two example documents could be:

{
  "name": "guitar",
  "data": {
    "inventory": {
      "locations": [
        {
          "state": "CA",
          "amount": 200
        },
        {
          "state": "AZ",
          "amount": 100
        },
        {
          "state": "NY",
          "amount": 300
        }
      ],
      "total": 600
    }
  }
}
{
  "name": "piano",
  "data": {
    "inventory": {
      "locations": [
        {
          "state": "CA",
          "amount": 200
        },
        {
          "state": "AZ",
          "amount": 100
        },
        {
          "state": "KY",
          "amount": 50
        }
      ],
      "total": 350
    }
  }
}

I'd like to be able to return a query that returns documents where the sum of the amount of the 'NY' and 'CA' nested documents is larger than 150. In the case with the example data it should return both documents, even though NY is not in the first document, since CA by itself is already at 200.

I've read most of the aggregations documentation that Elastic provides, but I've been unable to find a way to do this.

I'd love to get some pointers on how to solve this issue.

1 Answers1

0

I don't think you need aggregations but a customised search query which is done via Script Query.

Unfortunately, the way the nested documents are designed, it is not possible to achieve that via Script Query.

That said, this might not be an exact solution, but the below query would return all the documents that has either CA and/or NY and would sort them based on the sum of their amount.

That way, documents having higher sum of amounts would be appearing at the beginning of the results followed by documents having less value.

Query

POST <your_index_name>/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "data.inventory.locations",
            "query": {
              "match": {
                "data.inventory.locations.state": "CA"
              }
            }
          }
        },
        {
          "nested": {
            "path": "data.inventory.locations",
            "query": {
              "match": {
                "data.inventory.locations.state": "NY"
              }
            }
          }

        }
      ]
    }
  },
  "sort": [
    {
      "data.inventory.locations.amount": {
        "order": "desc",
        "mode": "sum",
        "nested_path": "data.inventory.locations",
        "nested_filter": {
          "terms": {
            "data.inventory.locations.state": ["CA","NY"]
          }
        }
      }
    }
  ]
}

Let me know if this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32
  • Hi @Kamal, sorry for the late response. Unfortunately i haven't been able to work on this until now. I've implemented your solution in my real data and it does exactly what you described. Unfortunately, I need my results sorted by date (which wasn't in the description), so I'll still need to find some way to filter results based on that sum. I'll also need to do aggregations on top of this filtered data. I guess I'll have to dig deeper into the Elastic docs to see if I can actually find a fitting solution. I your post can be a useful nudge in the right direction. – Stephan Janssen Feb 06 '19 at 17:16
  • hey @StephanJanssen, that's perfectly fine. I hope you get your solution, but if you need any help, do let me know. Good luck :) – Kamal Kunjapur Feb 07 '19 at 13:02