6

Its a microsoft interview question.

Read last n lines of file using C (precisely)

Well there could be so many ways to achieve this , few of them could be :

-> Simplest of all, in first pass , count the number of lines in the file and in second pass display the last n lines.

-> Or may be maintain a doubly linked-list for every line and display the last n lines by back traversing the linkedlist till nth last node.

-> Implement something of sort tail -n fname

-> In order to optimize it more we can have double pointer with length as n and every line stored dynamically in a round robin fashion till we reach the end of file.

for example if there are 10 lines in file and want to read last 3 lines. then we could create a array of buffer as buf[3][] and at run time would keep on mallocing and freeing the buffer in circular way till we reach the last line and keep a counter to know the current index of array.

Can anyone please help me with more optimized solution or atleast guide me if any of the above approaches can help me get the correct answer or any other popular approach/method for such kind of questions.

Community
  • 1
  • 1
Anshul
  • 1,416
  • 1
  • 16
  • 42

3 Answers3

9

You can use a queue and to store the last n lines seen in this queue. When you see the eof just print the queue.

Another way is reading a blocks of 1024 bytes from the end of file towards the beginning. Stop when you find n \n characters and print out the last n lines.

perreal
  • 94,503
  • 21
  • 155
  • 181
  • 2
    what if the lines are 500 bytes each, it going to be a big time pain managing the buffer joinings. – Anshul Mar 05 '13 at 05:13
  • 1
    @ansh, right, in that case starting backwards might still make sense though since you might discard gigabytes of data up to the last n lines and you may want not to buffer the data but just locate the offset – perreal Mar 05 '13 at 05:16
  • creating an offset is what i was thinking... may be perreal's idea could be effective –  Mar 05 '13 at 05:21
  • 1
    If the lines are long, just push the buffer onto a stack along with a pointer to the start of the first full line in the buffer. And then read another block starting at that position minus your buffer size. Basically, read overlapping buffers. No need to mess with joining the buffers. Or, better yet, just read a 64K buffer. You might still have to deal with long lines, but it'll be pretty rare. – Jim Mischel Mar 05 '13 at 05:55
  • I was wondering about using fseek, and I see no one mentions it. Does fseek have to parse the entire file before it finds EOF? – Kiith Nabaal Mar 05 '13 at 06:04
  • @KiithNabaal, the second method above uses fseek. It does not scan all file to find eof. – perreal Mar 05 '13 at 06:06
  • It seems to me like maybe you can just fseek to EOF, and then count how many bytes are in between two newline chars, and then do a memcpy or fgets. And basically once you hit the 4th \n, you would know you are done. – Kiith Nabaal Mar 05 '13 at 06:13
  • Well I don't know where you get the `4` from, but if you mean `n` then the problem is you cannot read a file backwards. So you need to read blocks from the end of file. – perreal Mar 05 '13 at 06:15
  • Ah yes, I meant n, I was thinking of the last 3 lines lol. I don't see why you can't read it backwards if you give fseek a -1 offset, and then read the current byte to see if it is a \n or not. Once you find the next \n, you would know you just read a line, and then just read in the number of bytes you counted. It may be redundant, but it seems like it is better than reading all of the characters in the file, especially if it is large. Hopefully this makes sense, I don't think I am being too clear – Kiith Nabaal Mar 05 '13 at 06:27
  • seeking byte by byte is very slow, especially if you have spinning drives – perreal Mar 05 '13 at 06:33
4

You can have two file pointers initially pointing to beginning of file.

Keep on incrementing first pointer till it find '\n' character also stores the instance of file pointer when it find '\n'.

Once it find (n+1)th '\n',assign first stored instance of file pointer which we previously saved,to second file pointer.Keep on doing the same till EOF.

So when first file pointer is on EOF,second will be on n '\n' back.Then print all characters from second file pointer to EOF.

So this is solution which can print last n lines in file in single pass.

Hitesh Menghani
  • 977
  • 1
  • 6
  • 14
1

How about using memory mapped file and scan the file from backward? This eliminates the hard work of updating the buffer window each time every time if the lines happened to be longer than your buffer space. Then, when you found a \n, push the position into a stack. This works in O(L) where L is the number of characters to output. So there is nothing really better than that is it?

phoeagon
  • 2,080
  • 17
  • 20