2
  • journal mode

data=journal mode provides full data and metadata journaling. All new data is written to the journal first, and then to its final location.

In the event of a crash, the journal can be replayed, bringing both data and metadata into a consistent state. This mode is the slowest except when data needs to be read from and written to disk at the same time where it outperforms all others modes. Enabling this mode will disable delayed allocation and O_DIRECT support.

Here I have a few questions, please take a look at it:

  1. Configure data=journal, then the user calls write(), does the write() return after the data is successfully written to the journal, or does it return the user success after entering the pagecache? If it is the latter, it means that the journal is submitted asynchronously, so the meaning of the journal of ext4 is to ensure the consistency of the file system itself, and there is no guarantee that user data will not be lost?

  2. If ext4 submits the journal asynchronously, when will the journal be triggered?

  3. Is there any other file system that allows the journal to be synchronized before write() returns successfully?

According to the results of my local experiments, it is inferred that the journal should be submitted asynchronously. I used a separate ssd partition as journal_dev. When I used fio to test and write files, I found that the io of journal_dev was intermittent, not always having IO.

benqwu
  • 21
  • 3
  • 1
    "the meaning of the journal of ext4 is to ensure the consistency of the file system itself, and there is no guarantee that user data will not be lost?" - Exactly. Journaling just prevents corruption of the filesystem in cases such system crash or power failure. It is not about not loosing the data. "Is there any other file system that allows the journal to be synchronized before write() returns successfully?" - No, it is not a purpose of filesystems. If an application bothers about particular write to be transferred to the disk, it should issue `sync()` call. – Tsyvarev Dec 11 '20 at 09:29
  • 2
    Use O_SYNC to force I/O before the write returns. – stark Dec 11 '20 at 18:29
  • 1
    As for "when will the journal be triggered?", last I checked it was "asynchronously, every 5 seconds", but the number may have changed. And yes, `write()` only guarantees in-memory consistency. – root Dec 14 '20 at 07:07
  • (FYI: fio has [sync](https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-sync)/[fsync](https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-fsync)/[fdatasync](https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-fdatasync) options) – Anon Dec 19 '20 at 18:14

1 Answers1

1
  1. the write() will return the user success after it has entered the page cache (assuming you aren't using any extra options on open()).
  2. At least periodically (see commit= in https://www.kernel.org/doc/Documentation/filesystems/ext4.txt ) and probably before any pending sync/fsync etc are allowed to complete.
  3. No (otherwise it would defeat the point of buffering).

If you were to pass O_SYNC to open() or to do an additional fsync you will learn about when your write made it to stable media as far as the kernel can know.

Anon
  • 6,306
  • 2
  • 38
  • 56