How to grep lines starting with a digit or white space

Question

I need to count messages per hour in my log file. Every log file line is preceded by the time stamp. Hence I am using following 'for' and 'grep' command to do this -

for i in `seq 0 23`
do egrep "$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
done

This will give me number of messages per hour for 0 to 23.

However this does not work with single digit hour such as 5:23:32 because it is preceded by a white space. Then the grep would have to be -

egrep " $i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l

If not it will incorrectly match lines starting with say 15:23:32

So how can I tell grep that a digit can be preceded by a space or start of the line only.

score 1 · Answer 1 · answered Feb 09 '15 at 21:32

I think I can get rid of your for loop. This will work if that time (rather than a date) begins each line:

$ awk -F : '/some_pattern/ { print $1 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then prints the first element (as delimited by a colon), which would be the hour. It is then sorted and repeats of unique elements are counted and displayed on standard output.

However, let's say your logs look like /var/log/syslog, which has lines that look like this:

Feb  9 01:23:45 mycomputer service[PID]: details...

In this case, you have to tell AWK where to look:

$ awk '/some_pattern/ { gsub(/:.*/,"",$3); print $3 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then replaces everything after the first colon of the third element (the time) an prints what remains (the hour). The rest is as described above.

A sample output (of either of the above variants):

This notes that there were twelve matches to my query at 7 am and that I didn't really start using this system until 11 am.

@Swiss you are right. I was merely trying to mimic the logic of the original post. It would certainly be faster without the regex, but it's unclear whether there is further content to be filtered through. — Adam Katz, Feb 10 '15 at 00:34

score 1 · Accepted Answer · answered Feb 09 '15 at 23:02

Using egrep

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" 'filename'; done

^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] this will tell egrep to match from start of line. if the line starts with a whitespace at the start of line or just starts with your pattern grep will match it. Also this will tell grep to match not to match greedily.

for example

using your command with a pattern to find 5:23:32, (where $i=5) we get

5:23:23
   15:23:23

using the command above, we get

 5:23:23

grep comes with a -c option to count

you can also use grep's -c option instead of piping to wc -l

example

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <pattern>" 'filename'; done

David Hoelzer · Answer 3 · 2015-02-09T21:33:12.130

0

grep "^[ 0-9][0]9...

I think this is what you're looking for unless I've misunderstood your question. Add the whitespace to the first set as an option and anchor it to the beginning of the line.

edited Feb 09 '15 at 21:33

answered Feb 09 '15 at 20:56

David Hoelzer

15,862
4
48
67

How can I use this in the for loop above ? So that I get the output numbers by hour – punekr12 Feb 09 '15 at 21:03
You can't use the regex shorthand in a character class. – Swiss Feb 09 '15 at 21:05

glenn jackman · Answer 4 · 2015-02-09T21:38:19.007

0

To match a timestamp where the hour from 0 to 9 is space-padded or zero-padded:

With basic regular expressions

grep '^\([ 01][0-9]\|2[0-3]\):[0-5][0-9]:[0-5][0-9]' file

or extended regular expressions

grep -E '^([ 01][0-9]|2[0-3])(:[0-5][0-9]){2}' file

ref: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

edited Feb 09 '15 at 21:38

answered Feb 09 '15 at 21:33

glenn jackman

238,783
38
220
352

1

Tangiential note: [leap seconds](https://en.wikipedia.org/wiki/Leap_second) occasionally show up as `23:59:60` – glenn jackman Feb 09 '15 at 22:08

How to grep lines starting with a digit or white space

4 Answers4