0

I want to do something similar to sed -i 's/abc/def/' file but without temp file. In my case match and replacement are of same length; is the following safe:

fd = open(file, O_RDWR);
fstat(fd, &sbuf);
mm = mmap(0, sbuf.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
i = 0;
while (i < sbuf.st_size) {
   memcpy(tmpbuf, mm + i, BUFSIZ);  // read from mem to tmpbuf (BUFSIZ at a time)
   if ((p = strstr(tmpbuf, needle))) { // match found
     memcpy(mm + i + (p - tmpbuf), replace, strlen(replace)); // strlen(replace) == strlen(needle)
   }
   i += BUFSIZ;
}
munmap(mm, sbuf.st_size);
fsync(fd);
close(fd);

(err handling omitted for brevity)

Also, not sure if mmap is making this any faster!

Ani
  • 1,448
  • 1
  • 16
  • 38

1 Answers1

4

It depends on what you mean by "safe". Unlike use of a temp file and atomic rename over top of the old file after finishing, there is no atomicity to this operation; other processes could see the file in an intermediate, partially-modified state. And moreover there is not any ordering between the stores; they could see the end of the replacement before they see the beginning of it, or see them in any other conceivable order, and possibly even in inconsistent ones if also using mmap and not using any memory barriers. Note that there's really nothing special about mmap; you could do the same thing with write too.

If none of these constitute "unsafety" for you, then the operation is totally safe. In particular it won't truncate the file or anything like that.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • If the host crashes during execution, you could end up with a file with a scattering of partial modifications. So the problem isn't limited to ephemeral view of the file. – rici Feb 28 '20 at 05:42
  • Thanks. I should have been a bit more elaborate. In my case, the file is accessed and written by a single process (not multi-threaded). Should I do a `fsync` after every `memcpy` (the _replace_ operation) ? – Ani Feb 28 '20 at 08:45
  • @rici: that's true too, but can happen anyway in the event of a system crash even with atomic renaming, which is only as good as the OS's efforts to harden against data loss on crash (including journaling filesystems, etc.) – R.. GitHub STOP HELPING ICE Feb 28 '20 at 14:00
  • @vyom: It doesn't help order within the individual memcpys. – R.. GitHub STOP HELPING ICE Feb 28 '20 at 14:01
  • @R..GitHubSTOPHELPINGICE: it should not be too hard for the filesystem's atomic rename to guarantee that you end up either with the original or the complete new file, even in the face of a host crash. Of course, if the disk itself is damaged you could end up with neither. But it should protect you from an inconsistent result. – rici Feb 28 '20 at 14:06