4

I am running ext4 filesystems on LVM. The two big speedup options I'm looking at are, LVM cache and the external ext4 journal.

It sounds like if I'm using a single SSD for this, LVM cache in writeback mode is the same thing as having your ext4 journal on an external device... basically if anything happens to it, you reset to the pre journal position.

Is this a correct interpretation?

RobC
  • 143
  • 1
  • 4
  • Everything is on a single SSD? – Michael Hampton Oct 11 '16 at 00:47
  • @MichaelHampton Yes, or even a raid 0 array eventually... i'm ok with a few minutes of loss, not with corruption, i think i answered it myself, check it out – RobC Oct 11 '16 at 04:05
  • 1
    I would hope that in this case you mean that the *cache* is on a single SSD, not the entire filesystem backing device. If the entire thing is on an SSD, then there wouldn't be any point to an external journal or cache. – Spooler Oct 11 '16 at 06:38

2 Answers2

6

An external journal isn't the same thing as an LVM cache device at all. An LVM cache on an SSD for writeback wouldn't be volatile, so the concerns with data integrity aren't enormous (except for if that cache device(s) were to suddenly fail - and this cache device can actually be a RAID in of itself via Linux MD or similar).

An EXT4 intent journal consists of many small writes that benefit from a fast low-latency storage device, be it external or the same device that data is stored on. When using rotational media for data disks, this becomes relevant in highly random and transnational workloads.

A writeback cache coalesces writes together so writes are for the most part sequential but makes no distinction between data and metadata, or of the journal in particular. It will stand in front of all writes, caching everything and then queuing writes to the disk in as sequential a manner as possible during a flush given the cached data set. Flush commands are sent at the same time as write barrier commands (at a given interval), ensuring a non-corrupt state on its backing device.

If a writeback cache suddenly and completely dies, you will lose some time on your filesystem but it will still be consistent. (EDIT: this statement is directly disputed in the comments below which warn of severe filesystem corruption.) This can be mitigated with a RAID1 cache device.

If a journal device dies, you will be unable to mount your filesystem until discarding the journal device using # tune2fs -O ^has_journal /path/to/ext4device. In this case, you would have to repair this filesystem with a full fsck scan (which in some cases would take a LOT of time). You would also likely see corruption if this happened during or before a power loss.

Adam Spiers
  • 570
  • 1
  • 4
  • 13
Spooler
  • 7,046
  • 18
  • 29
  • I'm talking ordered mode with async_writeback on the external ssd, ssd being only used for cache. It is written to the ssd drive first, then later written out to the disk, all with the ext4 journal, which to me is nearly identicaly to LVM cache writeback mode, except LVM is doing block transfers, and knows nothing of ext4, so ext4 external journal is required, and does the exact same thing as lvm writethrough, except at the ext4 level meaing an ssd crash does not corrupt the filesystem at all, you just data over a period of time. – RobC Oct 11 '16 at 18:43
  • It would be the fastest mode here http://raid6.com.au/posts/fs_ext4_external_journal/ I think I'm write on this, if I only have one ssd to speed things up, lvm cache in writethrough mode with the ext4 journal with journal_async_commit in ordered mode, will give the exact same performance boost with no chance of corruption, just the chance of lost data. – RobC Oct 11 '16 at 18:46
  • With ext4 it is completely safe to lose the journal, you just lose data. It is not safe to lose the LVM writeback device, since it is block based only, not filesystem aware. – RobC Oct 11 '16 at 18:49
  • While the LVM caching device isn't directly aware of the filesystem above it, it still honors commands sent from the filesystem to it. The best example of this in this case is write barriers. At a write barrier boundary, caches are flushed as required (assuming the cache supports barrier commands). LVM caches support such commands, and will commit data to a backing store on fsync() and data write barriers. The primary benefit of a writeback cache is queuing optimization. – Spooler Oct 11 '16 at 20:10
  • 1
    Thanks, it looks like with one SSD, writeback with boundaries enabled is what I want. No corruption but lost time if something goes down. If I had a second device/array, I could also use that for the external ext4 journal too, but it doesn't make sense with the LVM cache on the same device. With a separate device for the journal in data=journal and journal_async_writeback, it sounds like LVM in writeback mode would be near identical to writethrough mode because the only time data is hitting LVM is after it was already written to the journal device and is being flushed to the actual storage. – RobC Oct 13 '16 at 23:41
  • 3
    This is a VERY DANGEROUSLY wrong answer - dmcache in writeback mode will honor FUA and barriers and other requests, but this will NOT force data to be written back to the backing store. That means that sudden loss of the lvm cache when it is dirty (which might be all the time) can and WILL cause massive filesystem corruption. This is not only documented but easily verifiable - when the cache has dirty blocks that it is writing to the backing store, a sync will return immediately without waiting for them to be written back. Indeed, that is the purpose of the writeback mode. – Remember Monica Dec 09 '17 at 10:46
  • 1
    Also misleading but not dangerously so, writeback mode can reorder writes to be contiguous if possible, but will not, in general coalesce writes together to make them sequential "for the most part". – Remember Monica Dec 09 '17 at 10:51
  • 1
    @MarcLehmann Do you have a reference for the fact that a barrier request (e.g. `fdatasync()`) will NOT force data to be written to the backing store? This behaviour is exactly what I desire (I want `fdatasync()` to return as soon as it's written to the SSD device in front of my spinning disk; of course the SSD cache device is on RAID1 in that case), so it would be great to find an upstream documentation or refference that guarantees that. – nh2 Dec 31 '17 at 05:44
  • Spooler - as your answer is directly disputed by @MarcLehmann, and the safety of readers' data may well depend on the truth, it would be great if you could respond and between the two of you (and any other lvmcache/dmcache experts) come to a consensus on this. Thanks! – Adam Spiers Aug 23 '19 at 10:40
  • Just had a thought - is it possible one of you is talking about dm-writecache, and the other about dm-cache in writeback mode? I know you both refer to "writeback", but that difference might explain why your opinions regarding safety are on opposite ends of the spectrum. – Adam Spiers Aug 23 '19 at 12:45
  • Good thought, but both dm-writecache and dm-cache in writeback mode cache dirty blocks in the cache and ignore FUA w.r.t. the origin device - when the cache device is lost, dirty data and metadata updates are lost. – Remember Monica Sep 27 '19 at 03:54
0

So, I believe the correct solution is to use the LVM cache in writethrough mode, with the ext4 journal on the same device... or a different device in a better setup.

The logic is the ext4 journaling is the only thing that that guarantees consistency, so you have to use it. An external ssd device speeds that up greatly. The LVM cache in writeback mode would allow for corruption since it puts off simple block writes. In writethrough mode it still speeds up reads, but passes through the writes, which in this scenario ext4 would still put right on the same cache disk, almost the same as writethrough, but everything is guaranteed consistent.

I'll wait to vote myself right for a while in case a better response comes by.

RobC
  • 143
  • 1
  • 4
  • Guaranteed consistent as in a power outage won't corrupt anything, but I'm aware I'll still lose whatever is in the journal if i lose that device, probably many minutes of changes. – RobC Oct 11 '16 at 04:03
  • And I'm talking async journal mode, so it writes it to the journal, and writes it the disk at a later time, having it written to the ssd cache first is the speedup. If the cache disk dies you lose the changes, but the file system is in tact. – RobC Oct 11 '16 at 04:17