This question should go in the category: "a practical, answerable problem that is unique to software development".
Having a system that receives time-based events in the following form (simplified for clarity):
2020-04-10T11:00:00.000Z:system1,UP
2020-04-15T07:00:00.000Z:system1,DOWN
2020-04-20T09:15:00.000Z:system2,UP
2020-05-10T06:30:00.000Z:system3,UP
2020-05-15T22:30:00.000Z:system3,DOWN
2020-05-16T08:30:00.000Z:system3,UP
And imagine we graph this time series on a chart where:
- X-axis: time
- Y-axis: state (
DOWN
/UP
which we can give values1
/0
). Note that the values are reversed in order for the mathematical graph area in this question to make sense.
How can I save these events to a database (or equivalent) and compute the area between the X-axis and the graph (called numerical integration)? The start and end boundaries for the computation can be a full calendar month or the fragment of current month so far.
Example requirement
| <---- compute interval ----> |
|--------------------------------------------- <= DOWN
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
S1 . . .|--------|. . . . . . . . . . . . . . . . . . . . . . . . . . <= UP
| |
S2 . . . . . . . . . . . .|------------------------------------ . . .
| |
|--|
|xx|
S3 . . . . . . . . . . . . . . . . . . .|--------| |---------- . . .
| |
time: 10Apr 15Apr 20Apr 1May 10May 15May 16May 20May/now
Compute the amount of time (marked with xxx
in the graphs above) each of the systems were unavailable in May considering the current date 2020-05-20
. Note that:
system1
andsystem2
didn't even receive any events in May. They are "historically"DOWN
andUP
respectively. This case can go indefinitely long in the past.system3
had a 10 hour interruption in May
And the result should be:
- system1: X minutes (all minutes so far)
- system2: 0 minutes
- system3: 600 minutes
What I tried so far
- I have studied a bit Elasicsearch and this use case does not come as an easy task to solve (one would probably have to consider transform APIs and several aggregations)
- a potential solution with relational databases could be to save each new event with two timestamps (
begin
,end
), and upon saving, to search the history and close the previous open end event for the same system. Then comes the collection of these intervals and intersecting them with the period/month in question, etc.