bash tail on a live log file, counting uniq lines with same date/time

Question

I'm looking for a good way to tail on a live log file, and display number of lines with the same date/time.

Currently this is working:

 tail -F /var/logs/request.log | [cut the date-time] | uniq -c

BUT the performance is not good enough. There is a delay of more than one minute, and it output in bulks of few lines each time.

Any idea?

would you consider writing your own loop (bash script) that processes the output of `tail` and updates a counter until the date-time changes, then prints the count and resets the counter? — Floris, Dec 12 '13 at 20:57
@yaccz - it (explicit bash script to process lines from `tail` without passing through `cut` and `uniq`) is one way to bypass the problem of pipe buffering. But I found a (marginally) more direct way to do that… see answer below. — Floris, Dec 13 '13 at 02:51

score 13 · Accepted Answer · edited May 23 '17 at 10:30

Your problem is most likely related to buffering in your system, not anything intrinsically wrong with your line of code. I was able to create a test scenario where I could reproduce it - then make it go away. I hope it will work for you too.

Here is my test scenario. First I write a short script that writes the time to a file every 100 ms (approx) - this is my "log file" that generates enough data that uniq -c should give me an interesting output every second:

#!/bin/ksh
while :
do
  echo The time is `date` >> a.txt
  sleep 0.1
done

(Note - I had to use ksh which has the ability to do a sub-second sleep)

In another window, I type

tail -f a.txt | uniq -c

Sure enough, you get the following output appearing every second:

   9 The time is Thu Dec 12 21:01:05 EST 2013
  10 The time is Thu Dec 12 21:01:06 EST 2013
  10 The time is Thu Dec 12 21:01:07 EST 2013
   9 The time is Thu Dec 12 21:01:08 EST 2013
  10 The time is Thu Dec 12 21:01:09 EST 2013
   9 The time is Thu Dec 12 21:01:10 EST 2013
  10 The time is Thu Dec 12 21:01:11 EST 2013
  10 The time is Thu Dec 12 21:01:12 EST 2013

etc. No delays. Important to note - I did not attempt to cut out the time. Next, I did

tail -f a.txt | cut -f7 -d' ' | uniq -c

And your problem reproduced - it would "hang" for quite a while (until there was 4k of characters in the buffer, and then it would vomit it all out at once).

A bit of searching online ( https://stackoverflow.com/a/16823549/1967396 ) told me of a utility called stdbuf . At that reference, it specifically mentions almost exactly your scenario, and they provide the following workaround (paraphrasing to match my scenario above):

tail -f a.txt | stdbuf -oL cut -f7 -d' ' | uniq -c

And that would be great… except that this utility doesn't exist on my machine (Mac OS) - it is specific to GNU coreutils. This left me unable to test - although it may be a good solution for you.

Never fear - I found the following workaround, based on the socat command (which I honestly barely understand, but I adapted from the answer given at https://unix.stackexchange.com/a/25377 ).

Make a small file called tailcut.sh (this is the "long_running_command" from the link above):

#!/bin/ksh
tail -f a.txt | cut -f7 -d' '

Give it execute permissions with chmod 755 tailcut.sh . Then issue the following command:

socat EXEC:./tailcut.sh,pty,ctty STDIO | uniq -c

And hey presto - your lumpy output is lumpy no more. The socat sends the output from the script straight to the next pipe, and uniq can do its thing.

score 4 · Answer 2 · answered Mar 28 '15 at 22:16

4

You may try logtop, (apt-get install logtop):

Usage:

tail -F /var/logs/request.log | [cut the date-time] | logtop

Example:

$ tail -f /var/log/varnish/varnishncsa.log  | awk '{print $4}' | logtop
5585 elements in 10 seconds (558.50 elements/s)
   1  690 69.00/s [28/Mar/2015:23:13:48
   2  676 67.60/s [28/Mar/2015:23:13:47
   3  620 62.00/s [28/Mar/2015:23:13:49
   4  576 57.60/s [28/Mar/2015:23:13:53
   5  541 54.10/s [28/Mar/2015:23:13:54
   6  540 54.00/s [28/Mar/2015:23:13:55
   7  511 51.10/s [28/Mar/2015:23:13:51
   8  484 48.40/s [28/Mar/2015:23:13:52
   9  468 46.80/s [28/Mar/2015:23:13:50

Columns are, from left to right:

Just row number
qte seen
hits per second
the actual line

answered Mar 28 '15 at 22:16

Julien Palard

8,736
2
37
44

Seems to me this answers a different question? – Floris Jul 21 '15 at 04:16
It seems that this is not a live feed - the output shows lines not in chronological order. I thought OP wanted to know the number of times a particular message was repeated _as it happens_. – Floris Jul 28 '15 at 10:56
logtop is live, but not chronologically ordered, but putting the bigger (number of occurences) first. That (often) make more sense. – Julien Palard Jul 28 '15 at 12:10
I agree it is a good solution to the problem OP is probably trying to solve - but not a solution to the question posed. That's my only gripe. Personally I'm happy to learn about a new tool! – Floris Jul 28 '15 at 12:14

score 1 · Answer 3 · answered Dec 12 '13 at 21:00

Consider how uniq -c is working.

In order to print the count, it needs to read all the unique lines and only once a line that is different from the previous one, it can print the line and number of occurences.

That's just how the algorithm fundamentally works and there is no way around it.

You can test this by running

touch a
tail -F a | uniq -c

And then one after another

echo 1 >> a
echo 1 >> a
echo 1 >> a

nothing happens. Only after you run

echo 2 >> a

uniq can print there were 3 "1\n" occurences.

Of course. Still, I'd expect the log lines to be printed out when the date/time field changes, which happens every second! But it outputs in delay of more than one minute and in bulks, not live. — zapp, Dec 12 '13 at 21:09

bash tail on a live log file, counting uniq lines with same date/time

3 Answers3