2

I am reading a file in a while loop from start to end:

FILE *file;
file = fopen(path_to_file), "r");
char *line = NULL;
size_t len = 0;

while (getline(&line, &len, file) > 0) {
        delete_line_from_file(line);
}

fclose(file);

The function delete_line_from_file() removes the line passed to it from the file. It reads in the whole file via open(fd, O_RDONLY | O_CLOEXEC) + read() + close(), then removes the line from the buffer and writes the whole buffer to the same file via open(fd, O_WRONLY | O_TRUNC | O_CLOEXEC) + write() + close(). The read() is locked in an advisory read-lock via struct flock lk and the write() is locked in an advisory write-lock.

When I read the file there are lines that get missed which has something to do with me reading the file from start to finish in one loop while writing to it. If I read in the whole file and go through the buffer line-by-line no lines get missed. (This is my preferred solution so far.) There are also no mistakes made when truncating and writing the file. The missed lines are still in the file after the loop finishes.

Can I make sure that my while-loop does not miss a line and cleanly empties the file? The file needs to be emptied line-by-line. It cannot be just truncated.

Here is one possible solution I had in mind. Mirror the file via fstat(file &fbuf) and check it's size with if (fbuf.st_size !=0) fseek(file, 0, SEEK_SET); but that seems inefficient.

lord.garbage
  • 5,884
  • 5
  • 36
  • 55
  • Advisory locks are advisory. Your line-by-line reading loop does not use an advisory lock, therefore it can still read the file during updates. All parties, both readers and writers, must take the lock for the lock to affect them. It'd be easier to use record locks and whiteout (overwriting the contents with ignored data like spaces or zeros), so lock durations would be short; and only occasionally do a "repack", removing all whiteout. For removing a line, it's sufficient to lock and rewrite the file contents up to the end of the file, then truncate. – Nominal Animal Aug 22 '15 at 06:05
  • The last part "For removing a line, it's sufficient to lock and rewrite the file contents up to the end of the file, then truncate." I don't quite understand yet. Maybe this is something you didn't assume because I didn't specify so in my question: what if I were to delete a line from the middle of the file? – lord.garbage Aug 22 '15 at 06:38
  • If your file contents are `ABCDE`, and you delete `C`, you don't need to read and rewrite the entire file; you can just start reading from start of `D` till end of file, and write starting at start of `C`, and when you reach the end, truncate the file to the length where the final write ended. During this, you need an exclusive/write lock on the file (at least from start of `C` till end of file). If your line-by-line reader takes a read lock on the file, outside the loop, then it always gets a consistent view. Other strategies are possible, but more complicated. – Nominal Animal Aug 22 '15 at 06:46
  • Thanks! If I may ask, what do you think of `mmap()`ing the file and deleting the line? – lord.garbage Aug 22 '15 at 07:09
  • `mmap()` + `memmove()` (+ `mremap()` if necessary) + `msync()` + `ftruncate()` should work just fine, assuming you hold an advisory exclusive/read lock on at least the latter part of the file (from the start of the modification forwards) for the entire duration of the operation. Depending on your exact use scenario, it might make the most sense. I'd need to know more -- how readers read the file (fast? slow?), and how modifiers modify the file (delete lines? append lines? insert? overwrite?) -- to suggest a specific approach, really. – Nominal Animal Aug 22 '15 at 17:20

2 Answers2

0

So is the goal to empty the file completely?

Why don't you open the file as such:

open("file", O_TRUNC | O_WRONLY);

This will open the file with truncation. Alternatively, and perhaps a better solution, you can do this:

fopen("file", "w");

fopen with the "w" option delete the original file and replaces it with the new file of name "file".

gab64
  • 23
  • 4
0

Use fseek and ftell inside your loop.

Two processes modifying the same file is a recipe for problems. May be you need to use a pipe(2).

sureshvv
  • 4,234
  • 1
  • 26
  • 32