How to count patterns in multiple files using awk

Question

I have multiple log files and I need to count the number of occurrences of certain patterns in all those files.

#!/usr/bin/awk
match($0,/New connection from user \[[a-z_]*\] for company \[([a-z_]*)\]/, a) 
{instance[a[1]]++}

END {
 for(i in instance)
 print i":"instance[i]
}

I am running this script like this:

awk -f script *

But it looks like count is not correct. Is my above approach correct to handle multiple files?

The approach seems sound as such. Can you add some simple test data with the actual result you get and the correct result you expect? Please [edit] your question to update it. — tripleee, Feb 12 '18 at 11:04
@tripleee My question is, whether the value of `instance` will be retained while parsing multiple files? — cppcoder, Feb 12 '18 at 12:04
Found the issue: I missed `A-Z` in the regex and hence entries with capital letters were missing — cppcoder, Feb 12 '18 at 12:29

Guy · Answer 1 · 2018-02-12T12:50:06.723

try moving the curly brace up to the same line as the match function. Otherwise the instance[a[1]]++ will occur for every line. Doesn't it also print out the full line of every match too, to start? The match will have a default action of {print} when on its own line.

#!/usr/bin/awk

match($0,/pattern/, a) {
    instance[a[1]]++
}

END {
 for(i in instance)
 print i":"instance[i]
}

Further details of how individual file names are read is available at the GNU site but applicable generally. BEGIN is before any files have been read, END is after all, and variables stay the same, apart from a few built in (FNR for example, record number for this file).

How to count patterns in multiple files using awk

1 Answers1