AWK - Organising/Sorting multidimentional arrays 3.1.7

Question

My Question

I need to look at all lines within a log file (see input below) to work out how to organize the hours ($3) 00 - 23 (not always there all the time) without using a variable that increments and only print a single line for each hour.

Input file Sample

...
INFO  2016-06-15 00:00:30.065 TelegramDispatcher                                       - --> Complete telegram dispatching took 8.9ms (canHandle(56:TelegramHandlerECHLane) took 0.0ms, handleTelegram took 2.2ms, commit took 5.6ms, doACK took 0.6ms, doNAK took -0.0ms performAfterCommit took 0.3ms, failedCanHandle took 0.1ms)
INFO  2016-06-15 00:00:30.072 TelegramDispatcher                                       - --> Complete telegram dispatching took 7.2ms (canHandle(56:TelegramHandlerECHLane) took 0.0ms, handleTelegram took 2.0ms, commit took 4.1ms, doACK took 0.7ms, doNAK took -0.0ms performAfterCommit took 0.3ms, failedCanHandle took 0.1ms)
INFO  2016-06-15 00:00:30.114 TelegramDispatcher                                       - --> Complete telegram dispatching took 12.4ms (canHandle(69:TelegramHandlerTUNotification) took 0.0ms, handleTelegram took 4.3ms, commit took 6.6ms, doACK took 1.0ms, doNAK took -0.0ms performAfterCommit took 0.3ms, failedCanHandle took 0.2ms)
INFO  2016-06-15 00:00:30.165 TelegramDispatcher                                       - --> Complete telegram dispatching took 19.6ms (canHandle(69:TelegramHandlerTUNotification) took 0.0ms, handleTelegram took 3.6ms, commit took 14.3ms, doACK took 0.9ms, doNAK took -0.0ms performAfterCommit took 0.5ms, failedCanHandle took 0.1ms)
INFO  2016-06-15 00:00:30.271 TelegramDispatcher                                       - --> Complete telegram dispatching took 10.0ms (canHandle(69:TelegramHandlerTUNotification) took 0.0ms, handleTelegram took 3.7ms, commit took 4.8ms, doACK took 0.9ms, doNAK took -0.0ms performAfterCommit took 0.3ms, failedCanHandle took 0.1ms)
INFO  2016-06-15 00:00:30.300 TelegramDispatcher                                       - --> Complete telegram dispatching took 18.7ms (canHandle(61:TelegramHandlerPackingOrderBufferHanging) took 0.0ms, handleTelegram took 7.3ms, commit took 10.2ms, doACK took 0.8ms, doNAK took -0.0ms performAfterCommit took 0.3ms, failedCanHandle took 0.1ms)
...

Current Code

#!/usr/bin/gawk -f

BEGIN {
        print "-------------------------------------------------------"
        print "----------Telegram Processing Time by Hour-------------"
        print "-------------------------------------------------------"
} #End of BEGIN
{ #Start of MID
        key = substr($12,match($12,":")+1,match($12,")")-15); #Message Extracted 10 Total
        key2 = substr($3,1,2) #Hour
        MSG_TYPE[key]++ #Distinct Message
        MSG_HR[key,key2] += $11 #Tots up Time Took for each MSG by Hour
} #End of MID
END {
                for (msg in MSG_TYPE) {
                        print msg
                        print "-----------------------------------"
                        for(msghr in MSG_HR) {
                        split(msghr,indices,SUBSEP);
                        hr = indices[2];
                        print msg #Added this to try and Debug
                        print hr
                        print "AVG by Hour: "MSG_HR[msghr]"ms"
                        }
                print "\n"
                        }
} #End of END

To explain this code a little, the MSG_HR array is curently adding up $11 which for reference is the ***ms just after the first took, key is returning the msg and key2 is returning the hour.

Current Output Sample

...
TelegramHandlerECHLane
-----------------------------------
TelegramHandlerECHLane
14
AVG by Hour: 80950.6ms
TelegramHandlerECHLane
08
AVG by Hour: 25.2ms
TelegramHandlerECHLane
01
AVG by Hour: 75053.9ms
...
TelegramHandlerLaneStatusHangingMPA
-----------------------------------
TelegramHandlerLaneStatusHangingMPA
14
AVG by Hour: 80950.6ms
TelegramHandlerLaneStatusHangingMPA
08
AVG by Hour: 25.2ms
TelegramHandlerLaneStatusHangingMPA
01
AVG by Hour: 75053.9ms
...

Desired Output Sample

...
TelegramHandlerECHLane
-----------------------------------
00
AVG by Hour: 
01
AVG by Hour:
02
AVG by Hour:
03
AVG by Hour:
04
AVG by Hour:
05
AVG by Hour:
06
AVG by Hour:
...

There are 10 different MSG_TYPES that I am trying to set out in the format above. I am unable to upgrade to awk V 4.1.0 as I know this will make thing easier. (True Multidimentional Arrays etc..)

Any and all Help will be greatly appreciated.

instead of `HR` just print a variable you keep incrementing. Not much more is needed. — fedorqui, Jun 23 '16 at 13:24
I am sorry, I don't quite follow you. I have been working on this since 6am (2pm atm) so I am a little code blind. — glly, Jun 23 '16 at 13:25
I mean you are printing 14, 08, 01, 15... and you want to replace it with 00, 01, 02... Then, just use a variable you keep incrementing — fedorqui, Jun 23 '16 at 13:28
but I need to reference all lines in the log file and total up the value of `$11` and then group that count by `key` and the `substr` of `$3` — glly, Jun 23 '16 at 13:38
Then this is not a [mcve] clear enough to understand the problem you are facing. — fedorqui, Jun 23 '16 at 13:41
In `for(msghr in MSG_HR)` why you do loop over the whole `MSG_HR` and not over those elements of it which have key=msg ? — user31264, Jun 23 '16 at 14:01
That part was me trying to re-write the code due to the old version of `awk` that I am using as I had previously written it for `awk` 4.1.0 where as it will be run on a production server running `awk` 3.1.7. That particular code block is a modified version of http://stackoverflow.com/questions/14280877/multidimensional-arrays-in-awk — glly, Jun 23 '16 at 14:04
Again: Don't use all upper case for variable names in awk or in shell (unless exported) to avoid clashing with builtin variables and obfuscating your code by making your code look like it's using builtin variables when it's not. Also, make sure your posted expected output is derived from your posted sample input, not the output from some other set of input that you haven't shown us. — Ed Morton, Jun 23 '16 at 19:44

score 1 · Accepted Answer · answered Jun 23 '16 at 14:07

1

It seems you have a bug in your code. Excerpt from your code (I fixed the indentation):

        for (msg in MSG_TYPE) {
            print msg
            print "-----------------------------------"
            for(msghr in MSG_HR) {
                split(msghr,indices,SUBSEP);
                msg = indices[1];

Also, your outer loop is by messages. In your inner loop, you process all members of MSG_HR, rather than only those with specific message. It seems strange. Your outer loop has msg as the loop variable. However, in your inner loop you change msg (msg = indices[1];). Changing a loop variable within a loop is probably a bug.

answered Jun 23 '16 at 14:07

user31264

6,557
3
26
40

Ok, I have removed/commented out that particular line, `msg = indices[1];` but I am still seeing the same output all be it a lot smaller. – glly Jun 23 '16 at 14:13
Ok, could you recommend how I would process the lines specific to the message? – glly Jun 23 '16 at 14:15
Whatever the awk version, you do a loop over the whole MSG_HR, but you need a loop over those elements where key=msg. It is a bug. After you fix the bug, I may respond the original question. – user31264 Jun 23 '16 at 14:33
I have updated the original question + the output, removing the line in question as now that it has been pointed out it makes sense and I have added some Debugging just for my own peace of mind. – glly Jun 23 '16 at 14:37
The line `print msg #Added this to try and Debug` is useless, you need `print indices[1]` – user31264 Jun 23 '16 at 15:29
Ok I have added that and it isn't comparing the `indices[1]` to the `msg` – glly Jun 23 '16 at 15:54