4

I have a index of following structure

item_id: unique item id
sale_date: date of the date
price: price of the sale wrt the date

I want to create a histogram of the latest sale prices per item. aggregate term item_id and histogram of last or latest price

My first choice was to term aggregate item_id and pick price from top_hits size 1 order sale_date desc and create histogram on the python end.

but. since the data is in 10s of millions of records for one month. It is not viable to download all sources in time to perform histogram.

Note: Some item sell daily and some at different time interval. which makes it tricky to just pick latest sale_date

Updated:

Input: Item based sales time series data.

Output: Historgram of the count of items lies in a certain price buckets wrt to latest information

Jugraj Singh
  • 529
  • 1
  • 6
  • 22

1 Answers1

0

I have turn around that I used similar case, You can use max aggs with date type, and you can order aggregation based on nested aggs value, to be like:

"aggs": {
  "item ID": {
    "terms": {
      "field": "item_id",
      "size": 10000
    },
    "aggs": {
      "price": {
        "terms": {
          "field": "price",
          "size": 1,
          "order": {
            "sale_date": "desc"
          }
        },
        "aggs": {
          "sale_date": {
            "max": {
              "field": "sale_date"
            }
          }
        }
      }
    }
  }
}

I hope that will help you, and I wish you inform me if it works with you.

Nimer Awad
  • 3,967
  • 3
  • 17
  • 31
  • I think you didnot understand the question. this is not the results i am expecting. the aim is a histogram aggregation from the `selling price` terms in x axis and count item_id s for y axis – Jugraj Singh Dec 27 '19 at 12:22