As the title suggests I have data stored in multiple flat files in the following format:
215,,,215,16.4,0,2011/05/11 00:00:06
215,,,215,16.3,0,2011/05/11 00:00:23
217,,,217,16.3,0,2011/05/11 00:00:11
213,,,213,16.3,0,2011/05/11 00:00:17
215,,,215,16.3,0,2011/05/11 00:00:30
I am currently using the following awk command:
awk -F ',' '{gsub(/[\/:]/," ",$7); print mktime($7)":"$1":"$5}' MyFile
That gives me the output as follows (date converted to epoch, colon separator and moved around a little):
1305068406:215:16.4
1305068430:215:16.3
1305068411:217:16.3
1305068417:213:16.3
1305068423:215:16.3
The input file may not be in date order due to some hiccups when the file was being written, so next I pipe the output of the awk command above into a sort -n
which will ensure the data is sorted numerically with the oldest epoch time at the top.
1305068406:215:16.4
1305068411:217:16.3
1305068417:213:16.3
1305068423:215:16.3
1305068430:215:16.3
I am then piping the sorted output into another awk command:
awk -F ':' 'BEGIN {ORS=" ";c="rrdtool update ccdata2.rrd"; print c} NR % 100 == 0 {print "&& "c} $1>p {print $0;p=$0}'
This generates the output below, and ensures several rules:
- Every 100 records, prints a
&&
and a newrrdtool update ccdata.rrd
prefix (it doesent seem that rrdtool likes an update with a lot of records) - Only prints out an rrd data line if the epoch time is greater than the last
The final output is as follows:
rrdtool update ccdata2.rrd 1305068406:215:16.4 1305068411:217:16.3 1305068417:213:16.3 1305068423:215:16.3 1305068430:215:16.3
If there are 300 records it would be (you get the idea)
rrdtool update ccdata2.rrd x:x:x <100 times> && rrdtool update ccdata2.rrd x:x:x <another 100 times>
I am then piping the output of the command to bash
in order for the shell to execute the output rrdtool update
command.
The full command is:
awk -F ',' '{gsub(/[\/:]/," ",$7); print mktime($7)":"$1":"$5}' MyFile | sort -n | awk -F ':' 'BEGIN {ORS=" ";c="rrdtool update ccdata2.rrd"; print c} NR % 100 == 0 {print "&& "c} $1>p {print $0;p=$0}' | bash
How could the above process be improved ? How would you achieve the same thing ? Please state why in your answer. (i.e. could the two awk commands be converted into one)