0

A few months ago I used XFS formatted zram devices strung together with GlusterFS to create a distributed / networked / replicated in-memory filesystem on a few bare metal servers (running RHEL 7.2).

I'm using this FS as a performant way to store, serve and replicate images and videos for my multi-server application server. And I was unable to find any other in-memory FS solutions, so I hacked together this one.

It's been working well for 4 months, but last night one of the servers crashed because of an XFS corruption-- and I ended up having to do an OS Reload. I don't know for sure this setup was to blame.... but the odds are.

Which leads me to...

1) Are there any best practices I should follow to make this setup more stable?

2) Is there anyway I can (or even should) setup a logging system so that I can monitor each zram+xfs node's health ongoing? And know what went wrong if anymore crashes happen.

Some performance tests:

/dev/loop0 = https://erlhelinfotech.wordpress.com/2013/02/20/ramdisk-service-for-systemd/

/dev/zram0 = my zram setup

/dev/sdb2 = a standard 7200rpm disk

performance test with hdparm -Tt

Victor
  • 1
  • 2

2 Answers2

1

zram rarely if ever gets used to this level of performance. It is possible, though not proven, that you've triggered a bug somewhere in this storage stack.

Much more traditional, and presumably stable, is to put the block devices on permanent media. You might be surprised at the performance of some solid state with plenty of RAM for caching. With the added bonus of the data being persistent.

You can set yourself up for better handling of crashes. Remote syslog, remote netconsole, kernel debug packages, and support staff capable of making sense of it all.

Don't be afraid to try different components if the current combination is not working. Block file system, distributed file system, kernel version.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • surprisingly it's been working great for 4 months throughout all my building / testing / debugging. It'll get a full trial by fire in production though very soon. I also have another gluster FS on regular old disks, for permanent storage / backup. The in-memory one just serves "live" files. Upload files straight to memory, which triggers various inotify events.. that copy to disk. – Victor May 22 '17 at 06:33
  • more or less "bootstrapping" so trying to make the best of limited resources, and 32GB of RAM on each going largely unused otherwise is better than the $100 a month for each SSD, haha. ( running SoftLayer bare metals). – Victor May 22 '17 at 06:34
  • added some performance tests if you want to check it out ^ – Victor May 22 '17 at 07:49
0

In-memory filesystems are not thought for extended operation time, rather for short burst of high IOPS activity. Probably your server encountered an out of memery condition and, being unable to swap out (due to the locked-in memory assigned to the ramdrive device) simply crashed.

Anyway, to monitor your server's health, I suggest you to use something as Zabbix. You can also create an email alert which will trigger on out-of-memory and/or other errors.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • Each server has 32GB of memory, and I only allocated 7GB on each towards the filesystem, being conscious of that. The server also wasn't even in use when it crashed. I'm nearing the end of debugging / testing, getting ready to move into production of our app... so just concerned this will happen again when live! – Victor May 21 '17 at 19:50
  • checking out Zabbix – Victor May 21 '17 at 19:51