1

On my embedded system, I want to make sure that the data is safely written when I close a file - if the system reports that the data was saved, the user should be able to remove power immediately.

I know that the proper way to do this is fsync(), fclose(), and fsync() on the directory (cfr. this blog entry). However, it's a bit tricky to get a file descriptor for the directory in my case (I'd have to go through /proc/self/fd to find back the filename and derive the directory from there). It would be much simpler for me to just do syncfs() on the entire filesystem - I know that this is the only file that is open on the filesystem anyway.

Now my question is:

  • Is it sufficient to do syncfs()?
  • Do I need to fclose() the FILE * first (for the directory entry to be up-to-date)? Or is fflush() sufficient?
  • If it needs to be closed, is it useful to dup() the file descriptor before closing so I can use it directly for syncfs()?
Arnout
  • 2,927
  • 16
  • 24

2 Answers2

2

First of all, don't mix standard library <stdio.h> calls (like fprintf(3) or fopen(3)) with system calls (like open(2) or close(2) or sync(2)) as the formers are library routines that use in-process' buffers to store temporary data, for which the system is unaware, and the others are operating system interfaces that make the system responsible for the data maintainance from now onwards. You'll distinguish them easily as the former use FILE * descriptors to operate, while the last use int integer descriptors to operate on.

So if you use a system call to ensure your data is properly synced to disk, it is absolutely neccessary to first fflush(3) your process' buffer data before you do the filesystem sync(2) or fsync(2) call.

No sync(2) is warranted to happen at fclose(3) or even on close(2) time, or in the atexit() callbacks your process does before exit().
The operating system buffers are write delayed for performance reasons, and close(2) is not an event that makes it to trigger such a thing. Just think that many processes can be reading and writing the same file at the same time, and each close(2) triggering a filesystem flush could be a pain to achieve. Operating system triggers such calls at regular intervals, on umount(2) system calls, on system shutdown, and on specific calls to the sync(2) and fsync(2) system calls.

If you need to maintain the FILE *fd descriptor open, just do a fflush(fd) for that descriptor to ensure that the operating system has all its buffers for fwrite(3)d or fprintf(3)ed data first.

So finally, if you are using <stdio.h> functions, first do a fflush() for all the FILE * descriptors you have written to, or call fflush(NULL); to tell stdio to synch all descriptors in one call. Then do the sync(2) or fsync(2) call to ensure all your data is physically on disk. No need to close anything.

FILE *fd;
...
fflush(fd);
fsync(fileno(fd));
/* here you know that up to the last write(2) or fwrite(3)...
 * data is synced to disk */

By the way, your approach of going to /dev/fd/<number> to get the descriptor (that you had previously) is faulty for two reasons:

  • Once you close your descriptor, /dev/fd/<number> is not anymore the descriptor you want. Normally, it doesn't exist, even. Just try this:

    #include <string.h>
    #include <stdlib.h>
    #include <fcntl.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <errno.h>
    
    int main()
    {
        int fd;
        char fn[] = "/dev/fd/1";
    
        close(1); /* close standard output */
        fd = open(fn, O_RDONLY); /* try to reopen from /dev/fd */
        if (fd < 0) {
            fprintf(stderr,
                    "%s: %s(errno=%d)\n",
                    fn,
                    strerror(errno),
                    errno);
            exit(EXIT_FAILURE);
        }
        exit(EXIT_SUCCESS);
    } /* main */
    
  • You cannot get the directory where an open file belongs to with only the file descriptor. In a multilinked file, there can be thousands of directories just pointing to it. There's nothing on the inode (or in the open file structure) that allows you to get the path used to open that file. A common way to use temporary files is just to create them and immediately unlink(2) them, so nobody can open it again. As much as you retain the file open you have access to it, but no path points to it anymore.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • Also be carefull, as forcing syncs in embedded systems which make use of flash memory is a ticket to pain about faulty SD cards reaching their end of life time prematurely. – Luis Colorado Feb 10 '17 at 08:03
  • Thank you for your lengthy answer, but I asked about syncfs(), not sync() or fsync(). For the file data there is no problem, I can just do fflush() + fsync() (BTW there is no alternative to mixing FILE* and fd here, because the FILE* interface doesn't offer a sync operation). The problem is really the directory entry, which does not get synced when you do sync(), and which may still be updated anyway when you close() the fd (I'm not sure of that). You can actually get the directory by going through /proc/self/fd, on condition that it hasn't been rename()d of course. – Arnout Feb 11 '17 at 12:41
  • @Arnout, just read the manpages, `syncfs(2)` is the same functionality as BSD `fsync(2)`, to sync on secondary storage the data related to an open file. Yesterday i was on a Mac OS X which derives from BSD and no access to a linux system to check for man pages. The `FILE *` alternative offers you a `fileno(3)` function that gives you access to the plain filesystem descriptor, so all the system calls are available there. Anyway, the directory entry is never synced first (the entry is always synced after the file, to ensure filesystem integrity) and is not related to the file you have... – Luis Colorado Feb 12 '17 at 13:12
  • writen so far. I'm afraid you have not many alternatives there. – Luis Colorado Feb 12 '17 at 13:12
  • with `fflush()` you ensure only that your process' data is known to the operating system, not flushed to disk. That is, the library ensures that the `write(2)` system call is called to ensure that buffer data goes to system, but not ensures it will be physically stored on disk. The only way to ensure your data known to the OS is physically on disk is one of the calls `syncfs()`, `sync()` or `fsync()`. `syncfs()` is linux specific, the others are UNIX widespread. `sync()` is file agnostic, doesn't need a file to describe what has to be written and just syncs everithing. The others are... – Luis Colorado Feb 12 '17 at 13:27
  • file specific, meaning that only the file passed in descriptor is forced to disk. – Luis Colorado Feb 12 '17 at 13:32
1

Enable the "sync" flag in your filesystem (/etc/fstab), default is "async" (disabled) . When this flag is enabled, all changes to the according filesystem are inmediately flushed to disk. This makes your entire filesystem slow, but depending on your embedded system requirements, this can be a great option to consider.

aicastell
  • 2,182
  • 2
  • 21
  • 33
  • Thanks, but that's not an option. It would make each write() block for a relatively long time and we can't afford that. When the file is closed is a good time to sync. – Arnout Feb 09 '17 at 22:04