4

I need to count messages per hour in my log file. Every log file line is preceded by the time stamp. Hence I am using following 'for' and 'grep' command to do this -

for i in `seq 0 23`
do egrep "$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
done

This will give me number of messages per hour for 0 to 23.

However this does not work with single digit hour such as 5:23:32 because it is preceded by a white space. Then the grep would have to be -

egrep " $i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l

If not it will incorrectly match lines starting with say 15:23:32

So how can I tell grep that a digit can be preceded by a space or start of the line only.

punekr12
  • 653
  • 2
  • 7
  • 14

4 Answers4

1

I think I can get rid of your for loop. This will work if that time (rather than a date) begins each line:

$ awk -F : '/some_pattern/ { print $1 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then prints the first element (as delimited by a colon), which would be the hour. It is then sorted and repeats of unique elements are counted and displayed on standard output.

However, let's say your logs look like /var/log/syslog, which has lines that look like this:

Feb  9 01:23:45 mycomputer service[PID]: details...

In this case, you have to tell AWK where to look:

$ awk '/some_pattern/ { gsub(/:.*/,"",$3); print $3 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then replaces everything after the first colon of the third element (the time) an prints what remains (the hour). The rest is as described above.

A sample output (of either of the above variants):

 12 07
 34 08
 30 09
 51 10
536 11
346 12
123 13

This notes that there were twelve matches to my query at 7 am and that I didn't really start using this system until 11 am.

Adam Katz
  • 14,455
  • 5
  • 68
  • 83
  • 1
    `/some_pattern/` is not required; it can be left out. – Swiss Feb 09 '15 at 23:52
  • @Swiss you are right. I was merely trying to mimic the logic of the original post. It would certainly be faster without the regex, but it's unclear whether there is further content to be filtered through. – Adam Katz Feb 10 '15 at 00:34
1

Using egrep

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" 'filename'; done

^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] this will tell egrep to match from start of line. if the line starts with a whitespace at the start of line or just starts with your pattern grep will match it. Also this will tell grep to match not to match greedily.

for example

using your command with a pattern to find 5:23:32, (where $i=5) we get

5:23:23
   15:23:23

using the command above, we get

 5:23:23

grep comes with a -c option to count

you can also use grep's -c option instead of piping to wc -l

example

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <pattern>" 'filename'; done
repzero
  • 8,254
  • 2
  • 18
  • 40
0

grep "^[ 0-9][0]9...

I think this is what you're looking for unless I've misunderstood your question. Add the whitespace to the first set as an option and anchor it to the beginning of the line.

David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
0

To match a timestamp where the hour from 0 to 9 is space-padded or zero-padded:

With basic regular expressions

grep '^\([ 01][0-9]\|2[0-3]\):[0-5][0-9]:[0-5][0-9]' file

or extended regular expressions

grep -E '^([ 01][0-9]|2[0-3])(:[0-5][0-9]){2}' file

ref: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

glenn jackman
  • 238,783
  • 38
  • 220
  • 352