1

An existing application writes an output file constantly as it runs. I want to be able to read this file in another (C++) application, line by line, for external processing.

A realistic scenario is the existing application has been running for some time. My new application is launched and works through the output file, 'catching up' to the most recent entry. It then waits for new lines to be written to the file.

I do not need to parse the entire file, only to read it line-by-line - it's not XML or JSON or anything like that. As the file may be very large, I definitely don't want to load it all into memory. It's been a very long time since I worked on low-level file access in C++ so my questions are:

  • Do the standard file APIs allow me to read a file without caching it in-memory, if so how can I control this?
  • Does it require special attention reading from a file which is being written to?

I know this can be done at operating-system level, but I'm not sure how this is exposed through the C++ APIs in the standard library.

Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
  • Does this answer your question? [Implement "tail -f" in C++](https://stackoverflow.com/questions/2696976/implement-tail-f-in-c) – Botje Oct 21 '20 at 17:11
  • What do you want to do with the file? Read line by line but not parse it - do you discard those lines? It is not clear what the end goal is. – SergeyA Oct 21 '20 at 17:17
  • As long as the writing app creates the file with write access and read sharing, and the reading app opens the file with read-only access, you should be OK. In my case, I have a service that writes to a log file, and a viewer that displays the log in real-time. I use a memory mapped view of the file to access the data quickly, and have the service notify the viewer when new data has been written so the viewer can map and display it, but those are not strictly requirements in this case. – Remy Lebeau Oct 21 '20 at 20:12
  • @SergeyA I'm not sure it matters too much but I edited slightly... basically process each line in bespoke logic – Mr. Boy Oct 21 '20 at 20:12
  • @Botje I'm not sure. There's no accepted answer and the highly rated answer basically says to do what I'm asking, but not how. So I don't think that's a dupe – Mr. Boy Oct 21 '20 at 20:14
  • The question is operating system specific. A file on Linux is not exactly the same as a file on Windows. Did you consider using [sqlite](http://sqlite.org/) ? – Basile Starynkevitch Oct 21 '20 at 20:27
  • @BasileStarynkevitch how would sqlite help on generic text files? – Mr. Boy Oct 22 '20 at 09:08
  • sqlite might help to organize your data more efficiently on the disk. Without any [mre], we cannot guess what your application is supposed to do. – Basile Starynkevitch Oct 23 '20 at 17:22
  • @BasileStarynkevitch I think you've misread the question. The existing text file format cannot be changed. And I'm not asking you to guess about my application, but to answer the specific question. What I do with the data is out of scope. Not every question has an MCE in the real world. In this case, an MCE would be the answer! – Mr. Boy Oct 26 '20 at 16:38

1 Answers1

1

There are a few issues to be aware of when reading a growing file line-by-line:

  1. The producer may not necessaraly write a line into a file atomically. std::getline/gets strip off trailing \n, so your don't know whether it read the full line or EOF was hit.
  2. There are no facilities in C++ standard library to wait on a file to grow.

You would need to write your own getline that:

  1. Reads into a buffer of fixed length, which must be at least as big as longest line your producer could write. Using platform-specific functions, like POSIX read, so that you don't need to keep clearing EOF state on ifstream or FILE*.
  2. Finds complete lines in the buffer and passes them to the caller. An incomplete line is moved into the beginning of the buffer and a subsequent read reads past that incomplete line.
  3. When EOF is hit on read, waits on the file to grow using platform-specific means, like inotify. That may be tricky to implement without race conditions, so you may like to retry reading the file after a reasonable timeout. Goto 1.
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • On issue 2, yeah I'd assumed a timer would be appropriate if EOF is returned. For that matter, I am happy to wait until a full line _is_ terminated with `\n`, unless I simply keep reading as many bytes as I can and do the newline checking myself? – Mr. Boy Oct 21 '20 at 20:16
  • @Mr.Boy Yes, you keep reading as much as you can and keep scanning the buffer for `\n`. – Maxim Egorushkin Oct 21 '20 at 20:19
  • But as far as my query goes, I don't need to be worried about the whole file getting loaded into memory - standard libraries handle seeking through the file in an efficient way? – Mr. Boy Oct 21 '20 at 20:20
  • 1
    @Mr.Boy Standard libraries never read an entire file into memory, unless it is smaller than the read buffer. Using POSIX `read` you only read your buffer size bypassing standard library buffers. – Maxim Egorushkin Oct 21 '20 at 20:22