1

I'm trying to find out if I can store values captured at irregular intervals into an RRD.

I have a script which connects to an ActiveMQ server subscribes to a queue or topic and looks at the message header time stamp, compares it with Time.now to give me a delta.

The data I get from my script is as below;

000000.681 Time Delta
000000.793 Time Delta
000000.583 Time Delta
000001.994 Time Delta

The issue I face is that messages from the ActiveMQ don't necessarily come in at a 'regular interval' (e.g 1/sec, 1/2sec) They could come in at peak times as 5 a second, and quite times 1 every 10 seconds.

I'd like to be able to capture the output into an RRD so I can graph against it but having a look around on the internet it's not clear is this can be done, or if I'd be better off using a.n.other database/store to capture the data into.

The eventual output I'd like would be a graph showing the time delta for each message.

It looks like I could set the RRD using --step to 1 second, and the hart beat to 2 seconds having had a read of the docs.

I found a couple of posts here and here which talk about being careful with the intervals and the fact my data might be averaged, smoothed or otherwise messed about with when written to the RRD. But nothing I've found online has a similar usage case to mine so its a bit hard to know where I should be looking. I'd like my data to be stored as point for each message received.

I have a couple of RRD's setup for testing; one is taking the AVERAGE the other is taking the LAST to produce some graphs. My heartbeat is set for 100 seconds, but the interval is set to 1. I'm now getting data which looks correct. I'm also guessing that empty spaces in graph from the LAST RRA are due to my data coming in slower that 1 per second?

I'll post my create code & output as an answer.

Community
  • 1
  • 1
user3788685
  • 2,943
  • 5
  • 26
  • 45

2 Answers2

2

rrdtool will always store data at regular intervals. As data is handed over to rrdtool, it first gets re-sampled to the --step interval. and then further consolidated to the intervals setup in the RRAs.

The exact arrival time of the data (to the millisecond) is taken into account as the re-sampling takes place ...

If two data points are further apart than specified by mrhb, the data is considered non-continuous and rrdtool will store 'unknown' for the interval affected.

Tobi Oetiker
  • 5,167
  • 2
  • 17
  • 23
  • I think I'm going to change the way I do it. I managed to get some sort of graph but I'm not sure what its making the averages from. My thinking now is to use the existing ruby scrip to sample every 10 timestamps take an average then write to a log file which I can pick up with mrtg every min – user3788685 Jul 30 '16 at 18:05
  • this is not necessary, you can run as many updates as you want ... rrdtool will build the average for you. – Tobi Oetiker Jul 30 '16 at 19:24
  • I think I've managed to sort it out. I'll update my question in a bit with the extra info. It's been running all night and the graphs look ok vs the data im observing. I was setting the hart-beat too low so was missing data, now its set much higher things look good. – user3788685 Jul 31 '16 at 12:13
1

I ended up making two sets of RRD's to experiment with;

rrdtool create test1.rrd \
--step '1' \
'DS:ds0:GAUGE:5:0:U' \
'RRA:AVERAGE:0.5:1:86400' \
'RRA:MAX:0.5:1:86400' \
'RRA:AVERAGE:0.5:60:10080' \
'RRA:MAX:0.5:60:10080' \
'RRA:AVERAGE:0.5:120:21600' \
'RRA:MAX:0.5:120:21600' \
'RRA:AVERAGE:0.5:300:105120' \
'RRA:MAX:0.5:300:105120'

and

rrdtool create test.rrd \
--step '1' \
'DS:ds0:GAUGE:5:0:U' \
'RRA:AVERAGE:0.5:1:86400' \
'RRA:LAST:0.5:1:86400' \
'RRA:AVERAGE:0.5:60:10080' \
'RRA:LAST:0.5:60:10080' \
'RRA:AVERAGE:0.5:120:21600' \
'RRA:LAST:0.5:120:21600' \
'RRA:AVERAGE:0.5:300:105120' \
'RRA:MAX:0.5:300:105120'

Which allows me to store;

1sec, archive is kept for 1day back
1min, archive is kept for 7day back
2min, archive is kept for 30day back
5min, archive is kept for 1year back

Which makes these nice graphs;

1 Hour Average & Max 1 Hour Average & Last

The graphs where made in PHP with the following code;

<?php
  $opts = array( 
                '--width', '600',
                '--height', '100',
                '--title', 'Avg Time Delta xxxxxxxxxx (Last 1 Hr)',
                '--vertical-label', 'Time Delta',
                '--watermark', 'xxxxxxxxxx',
                '--start', 'end-1h',
                'DEF:out=test.rrd:ds0:AVERAGE',
                'DEF:max=test.rrd:ds0:MAX',
                'AREA:out#9966FF:Avg Time Delta',
                'LINE:max#996600:Max Time Delta',
               );

  $ret = rrd_graph("graphs/1hr-graph.png", $opts);

  if( !is_array($ret) )
  {
    $err = rrd_error();
    echo "rrd_graph() ERROR: $err\n";
  }
        echo '<img src="http://server/graphs/1hr-graph.png">';
        echo '<BR>';    
?>

<?php
  $opts = array( 
                '--width', '600',
                '--height', '100',
                '--title', 'Last Time Delta xxxxxxxxxx (Last 1 Hr)',
                '--vertical-label', 'Time Delta',
                '--watermark', 'xxxxxxxxxx',
                '--start', 'end-1h',
                'DEF:avg=test1.rrd:ds0:AVERAGE',
                'DEF:last=test1.rrd:ds0:LAST',
                'AREA:avg#99AAFF:Avg Time Delta',
                'LINE:last#99AA00:Last Time Delta',
               );

  $ret = rrd_graph("graphs/1hr-last.png", $opts);

  if( !is_array($ret) )
  {
    $err = rrd_error();
    echo "rrd_graph() ERROR: $err\n";
  }
        echo '<img src="http://server/graphs/1hr-last.png">'
?>

From my own sanity checking and watching the data in realtime it looks like both of those graphs are correct, but behave in slightly different ways. When the data feed which this is monitoring is quite and I'm only getting 1 mesg every 10 sec I get a lot of gaps in the LAST graphs whereas the AVERAGE graphs are smoothed out to fill the gaps. I also tried with setting another RRD to ABSOLUTE but the graphs for that looks 'wrong' and the times are all below 1.0.

So it looks like I can feed my RRD at whatever interval I like from my script. It looks like the RRD will sample my data by its defined interval (In my case 1 sec) and then do what it needs to do based on the way I save it (Gauge, Absolute etc) With my heart-beat set to 100 I should always receive some data before that 100 sec times-out - thus avoiding NAN entries in my database.

At the moment I can't tell how well behaved this config will be during times of disruption (e.g delayed messages from the AMQ server) I will try and run some tests when I get some spare time and report back with anything significant.

user3788685
  • 2,943
  • 5
  • 26
  • 45