11

I have series of values that specify the status of jobs, either success or failure.

Each item in the series has a type, and also a grouping ID. It's a dependent sequence so job bar for ID A will run if job foo was successful.

idx time                    id     type    status
1   2018-01-01T00:00:00     A      foo     success
2   2018-01-01T01:00:00     A      bar     success
3   2018-01-02T00:00:00     B      foo     success
4   2018-01-02T02:00:00     B      bar     success
5   2018-01-03T00:00:00     C      foo     success
6   2018-01-03T03:00:00     C      bar     success
7   2018-01-04T00:00:00     D      foo     success
8   2018-01-04T01:00:00     D      bar     success
9   2018-01-05T00:00:00     E      foo     success
10  2018-01-05T01:00:00     E      bar     success
11  2018-01-06T00:00:00     F      foo     failure
12  2018-01-07T00:00:00     G      foo     failure
13  2018-01-08T00:00:00     H      foo     recovery
14  2018-01-08T03:00:00     H      bar     success
15  2018-01-09T00:00:00     I      foo     failure
16  2018-01-10T00:00:00     J      foo     failure
17  2018-01-11T00:00:00     K      foo     recovery
18  2018-01-11T03:00:00     K      bar     success

I need to calculate the time between dependent jobs and recovery time for per job type.

For the data above, the time between dependent jobs series would be:

  • elapsed(1,2) (group A)
  • elapsed(3,4) (group B)
  • elapsed(5,6) (group C)
  • elapsed(7,8) (group D)
  • elapsed(9,10) (group E)
  • elapsed(13,14) (group H)

And the recovery time series would be:

  • elapsed(11, 13) (between groups F and H)
  • elapsed(15, 17) (between groups I and K)

I have used the elapsed() and difference() aggregates to get what I need, but it seems inelegant.

For the recovery time series:

I added a "success" value to my points, with an integer number specifying 1 for "success" or "recovery", and 0 for "failure".

I used difference() on that to determine whether the point was a new failure (-1), repeat of previous status (0), or a recovery (+1). Filtering out the zero values gives me just failures and recoveries.

elapsed() on these points gives the time between failure and recovery, but also recovery and the next failure. These can be differentiated by adding cumulative_sum() of the diff'd successes to the final series (-1 is a failure; 0 is a recovery) and then filtering for only the recoveries.

SELECT elapsed FROM (
    SELECT elapsed(diff) as elapsed, cumulative_sum(diff) as csum FROM (
        SELECT diff from (
            SELECT difference(success) AS diff FROM test WHERE type = 'foo'
        ) WHERE diff != 0
    )
) WHERE csum = 0

Result on the above dataset:

time                 elapsed
----                 -------
2018-01-08T00:00:00Z 172800000000000
2018-01-11T00:00:00Z 172800000000000

For the time between dependent jobs series:

This can be done with the same trick, adding an integer value specifying the type of job. Below the value bar is 1 if type = 'bar' and 0 otherwise.

SELECT elapsed FROM (
    SELECT elapsed(success) as elapsed, difference(bar) as diff
        FROM test WHERE type='bar' OR (type='foo' AND success=1)
) WHERE diff = 1

Result on the above dataset:

time                 elapsed
----                 -------
2018-01-01T01:00:00Z 3600000000000
2018-01-02T02:00:00Z 7200000000000
2018-01-03T03:00:00Z 10800000000000
2018-01-04T01:00:00Z 3600000000000
2018-01-05T01:00:00Z 3600000000000
2018-01-08T03:00:00Z 10800000000000
2018-01-11T03:00:00Z 10800000000000

Is there a better way to do this? Either a completely different strategy, or without having to add all these integer values to my points, e.g. by applying difference() to a calculated field?

Most of the complexity comes from trying to select every other point from a series. Is there a better way to do that?

Matt
  • 2,153
  • 1
  • 18
  • 29

0 Answers0