I have a index of following structure
item_id: unique item id
sale_date: date of the date
price: price of the sale wrt the date
I want to create a histogram of the latest sale prices per item. aggregate term item_id
and histogram of last or latest price
My first choice was to term
aggregate item_id
and pick price
from top_hits
size 1 order sale_date
desc
and create histogram on the python end.
but.
since the data is in 10s of millions of records for one month. It is not viable to download all sources
in time to perform histogram.
Note: Some item sell daily and some at different time interval. which makes it tricky to just pick latest sale_date
Updated:
Input: Item based sales time series data.
Output: Historgram of the count of items lies in a certain price buckets wrt to latest information