Efficient implementation of tail -n

Question

Possible Duplicate:
How would you implement tail efficiently?

A friend of mine was asked how he'd implement tail -n. To be clear, we are required to print the last n lines of the file specified.

I thought of using an array of n strings and overwriting them in a cyclic manner. But if we are given, say a 10 GB file, this approach doesn't scale at all.

Is there a better way to do this?

score 6 · Accepted Answer · answered Jul 30 '12 at 15:42

Memory map the file, iterate from the end looking for end of line n times, write from that point to the end of file to standard out.

You could potentially complicate the solution by not mapping the whole file, but just the last X kb of memory (say a couple of memory pages) and seeking there. If there aren't enough lines, then memory map a larger region until you get what you want. You can use some heuristic to implement the guess for how much memory you want to map (say 1kb per line as a rough estimate). I would not really do this though.

score 2 · Answer 2 · answered Jul 30 '12 at 15:47

"It depends", no doubt. Given the size of the file should be knowable, and given a sensible file-manipulation library which can 'seek' to the end of a very large file without literatally traversing each byte in turn or thrashing virtual memory, you could simply scan backwards from the end counting newlines.

When you're dealing with files that big though, what do you do about the degenerate case where n is close to the number of lines in the multi-gigabyte file? Storing stuff in temporary strings won't scale then, either.

Efficient implementation of tail -n

2 Answers2