0

I'm using ElasticSearch (OpenSearch) to store multiple objects in an index. Each object has a unique ID, creation date and an "amount" field that stores integer value. This "amount" value changes periodically, so I'm updating objects in-place using the IDs to match the existing objects.

What I want is to display a histogram of total amount across all objects over time. The problem is that I need to preserve historical data somehow in order for it to be displayed on the graph.

What are my option to implement this? Is there an automatic way to aggregate and preserve these totals? Or should I do this manually from my application? E.g. send these totals instead of individual objects?

The data structure:

[
  { id: '{uuidv4}', amount: 100 },
  { id: '{uuidv4}', amount: 25 },
  { id: '{uuidv4}', amount: 150 },
  { id: '{uuidv4}', amount: 0 },
  ...
]
Slava Fomin II
  • 26,865
  • 29
  • 124
  • 202

2 Answers2

1

(from an Elasticsearch point of view) you would normally store each each change as an individual record, rather than updating the existing one. that way you get the history and can do your aggs, and still show the latest values

just treat it as any other time based data, logs, click streams, transaction. etc

warkolm
  • 1,933
  • 1
  • 4
  • 12
  • Thank you for you answer. However, could you give an example of how exactly should I change my data structure, what events should I send and how to aggregate them to render the desired graph? Do you propose to send changes in the amount and then to sum all the changes at the specific point in time? – Slava Fomin II Feb 11 '22 at 13:27
  • `{"timestamp": time, "id": "uuidv4", "amount": 100 }` is how you would typically do it in Elasticsearch, then each change is recorded as a stand alone event. from there you run aggregations over it – warkolm Feb 11 '22 at 21:13
-1

Elasticsearch isn't very fit for processing historical or time-series data. It has some good tooling for cases like logging but fails when the use case assume some continuity in the data, like metric gauges or real-world prices. It is technically possible to create a cron job that would retrieve current values every minute and store them somewhere but it will be inefficient and poorly scalable.

Instead, it would make sense to use a specialized time-series database, like InfluxDB that allow aggregations like that together with the ability to zoom in and out of the timeline.

ilvar
  • 5,718
  • 1
  • 20
  • 17
  • Elasticsearch can handle this easily, I am not sure how you think it fails at this given logs are time series data, and it handles logs easily enough – warkolm Feb 11 '22 at 00:33
  • This use case is not logs but rather metric gauges, and (if I understood it correctly), we want to build a graph of some aggregation of multiple gauges over time. Elasticsearch doesn't have a concept of "time", it can only create or update some documents + run scripts on them to get data like "how long ago this doc was created". You can adjust the architecture to allow point-in-time queries like "look for docs that had a value of `foo` equal to `bar` a week ago" but I don't know of any architectures which would allow for fast aggregations on that over time. – ilvar Feb 11 '22 at 15:35
  • they're both time based data. and Elasticsearch aggregates across billions of time based data points with sub-second response times – warkolm Feb 11 '22 at 21:15
  • It is indeed _technically possible_ to scan all gauges every minute and write a doc for each into ES and then aggregate on them. It just won't be the best tool for this job. – ilvar Feb 11 '22 at 22:53