8

I've got a general-purpose server, providing mail, DNS, web, databases, and some other services for a number of users.

It's got a Xeon E3-1275 at 3.40 GHz, 16 GB ECC RAM. Running Linux kernel 4.2.3, with ZFS-on-Linux 0.6.5.3.

The disk layout is 2x Seagate ST32000641AS 2 TB drives and 1x Samsung 840 Pro 256 GB SSD

I've got the 2 HDs in a RAID-1 mirror, and the SSD is acting as a cache and log device, all managed in ZFS.

When I first set up the system, it was amazingly fast. No real benchmarks, just... fast.

Now, I notice extreme slowdowns, especially on the filesystem holding all of the maildirs. Doing a nightly backup takes over 90 minutes for a mere 46 GB of mail. Sometimes, the backup causes such an extreme load that the system is nearly unresponsive for up to 6 hours.

I've run zpool iostat zroot (my pool is named zroot) during these slowdowns, and seen writes on the order of 100-200kbytes/sec. There are no obvious IO errors, the disk doesn't seem to be working particularly hard, but read is almost unusably slow.

The strange thing is that I had the exact same experience on a different machine, with similar spec hardware, though no SSD, running FreeBSD. It worked fine for months, then got slow in the same way.

My going suspicion is this: I use zfs-auto-snapshot to create rolling snapshots of each filesystem. It creates 15-minute, hourly, daily, and monthly snapshots, and keeps a certain number of each around, deleting the oldest. It means that over time, thousands of snapshots have been created and destroyed on each filesystem. It's the only ongoing filesystem-level operation that I can think of with a cumulative effect. I've tried destroying all of the snapshots (but kept the process running, creating new ones), and noticed no change.

Is there a problem with constantly creating and destroying snapshots? I find having them an extremely valuable tool, and have been led to believe that they are (aside from disk space) more or less zero-cost.

Is there something else that may be causing this problem?

EDIT: command output

Output of zpool list:

NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot  1.81T   282G  1.54T         -    22%    15%  1.00x  ONLINE  -

Output of zfs list:

NAME             USED  AVAIL  REFER  MOUNTPOINT
zroot            282G  1.48T  3.55G  /
zroot/abs       18.4M  1.48T  18.4M  /var/abs
zroot/bkup      6.33G  1.48T  1.07G  /bkup
zroot/home       126G  1.48T   121G  /home
zroot/incoming  43.1G  1.48T  38.4G  /incoming
zroot/mail      49.1G  1.48T  45.3G  /mail
zroot/mailman   2.01G  1.48T  1.66G  /var/lib/mailman
zroot/moin       180M  1.48T   113M  /usr/share/moin
zroot/mysql     21.7G  1.48T  16.1G  /var/lib/mysql
zroot/postgres  9.11G  1.48T  1.06G  /var/lib/postgres
zroot/site       126M  1.48T   125M  /site
zroot/var       17.6G  1.48T  2.97G  legacy

This is not a very busy system, in general. Peaks on the graph below are nightly backups:

IO statistics

I've managed to catch the system during a slowdown (starting around 8 this morning). Some operations are fairly responsive, but the load average is currently 145, and zpool list just hangs. Graph:

/dev/sdb latency

squidpickles
  • 791
  • 1
  • 8
  • 12
  • Please show `zpool list` and `zfs list`. – ewwhite Oct 23 '15 at 22:50
  • Is your pool nearly 80% full? That could cause problems. – Ryan Babchishin Oct 24 '15 at 01:40
  • Oh no... ZFS root on Linux. Hmm... Have you done _any_ tuning? Also, you may be suffering from fragmentation. What's your ZoL version? Have you updated at all? – ewwhite Oct 24 '15 at 02:52
  • If I'm reading things correctly, zpool is version 28, zfs is version 5. Not close to 80% full (more like 16% full?). ZoL is latest, 0.6.5.3. – Kaolin Fire Oct 24 '15 at 04:35
  • It was also suggested that the SSD might be failing under heavy use as log, but SMART says it's doing well, I think. Reallocated_Sector_Ct 0, Wear_Leveling_Count raw value 402 (and value is 88), no errors... – Kaolin Fire Oct 24 '15 at 04:50
  • and dedup is off (hence the 1.00x?). But double-checked. :) – Kaolin Fire Oct 24 '15 at 05:24
  • @KaolinFire I'm curious if an SSD acting as both a log and cache could be overloaded, and become your bottleneck. I was wondering the same thing about my array that was setup the same way. Have you tested without the SSD? – Ryan Babchishin Oct 24 '15 at 05:33
  • @RyanBabchishin I'm pretty certain that it's not the SSD. The system's just not that busy by ZFS standards (or really any standards). There's constant IO, but on the order of 10-20kbytes/sec. (Note, on the chart above, sda is the SSD, and sdb/sdc are the hard drives). – squidpickles Oct 24 '15 at 06:34
  • @ewwhite no tuning. I would also suspect ZFS-on-Linux, but I had identical problems on FreeBSD. Again, the server would just randomly slow down to the point of a simple "ls" taking 1-2 minutes in a directory with 10 files. Rebooting always fixes the problem. – squidpickles Oct 24 '15 at 06:35
  • Reduce your ARC to 30-40% of ram. – ewwhite Oct 24 '15 at 11:00
  • Or you may just be too fragmented. – ewwhite Oct 24 '15 at 11:18
  • @ewwhite Trying with ARC max at 6 GB. It can take a while (a week or two) for the problem to show up, so this may have to go on hold for a bit... – squidpickles Oct 24 '15 at 19:12

1 Answers1

1

Look at arc_meta_used and arc_meta_limit. With lots of small files you can fill up the meta data cache in ram so it has to look at the disk for file info and can slow the world to a crawl.

I'm not sure how to do this on Linux, my experience is on FreeBSD.

  • Interesting—thanks! Adding https://github.com/zfsonlinux/zfs/issues/1261 for reference. chaos root:~# cat /proc/spl/kstat/zfs/arcstats | grep arc_meta_used arc_meta_used 4 5895985776 chaos root:~# cat /proc/spl/kstat/zfs/arcstats | grep arc_meta_limit arc_meta_limit 4 6301327872 – Kaolin Fire Oct 26 '15 at 20:40
  • Looking at the disk IO rates, though, it doesn't seem there's actually much physical disk activity. – squidpickles Nov 04 '15 at 18:57