How efficient is the tac command on large files

Question

The taccommand (catreversed) can be used to read a file backwards, just like cat reads it rom the beginning. I wonder, how efficient this is. Does it have to read the whole file from the beginning and then reverses some internal buffer when it reaches the end?

I was planning on using it for some frequently called monitoring script which needs to inspect the last n lines of a file that be several hundreds of megabytes in size. However, I don't want that to cause heavy I/O load or fill up cache space with otherwise useless information by reading through the file over and over again (about once per minute or so).

Can anyone shed some light on the efficiency of that command?

IIRC tail works by reading the file from the beginning, but that information could be out of date, too. That's why I am asking :) — Daniel Schneller, Jan 28 '15 at 17:09
However, reading http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_plain;f=src/tac.c;hb=refs/heads/master it seems there is an algorithm in place that skips towards the end of the file and starts looking for the separators there. I am not that fluent in C, but judging by the code comments and function names, it seems to be like that. — Daniel Schneller, Jan 28 '15 at 17:10
@DanielSchneller, tail will read from the beginning of the file if you do something awful like `cat foo | tail`, but if it's `tail foo` or even `tail — Charles Duffy, Oct 04 '16 at 17:18

score 8 · Accepted Answer · answered Oct 04 '16 at 17:23

When used correctly, tac is comparably efficient to tail -- reading 8K blocks at a time, seeking from the back.

"Correct use" requires, among other things, giving it a direct, seekable handle on your file:

tac yourfile   # this works fine

...or...

tac <yourfile  # this also works fine

NOT

# DON'T DO THIS:
# this forces tac to copy "yourfile" to a new temporary file, then uses its regular
# algorithm on that file.
cat yourfile | tac

That said, I'd consider repeatedly running a tool of this nature a very inefficient way to scan logs, compared to using logstash or a similar tool that can feed into an indexed store and/or generate events for real-time analysis by a CEP engine such as Esper or Apache Flink.

Thanks for your reply. Re logstash etc.: I totally agree, however in this particular case that would have been a much bigger overhead than what was actually needed. — Daniel Schneller, Oct 17 '16 at 14:05

How efficient is the tac command on large files

1 Answers1