0

For zpools which are rather full and/or very fragmented I use to enable metaslab debugging (echo metaslab_debug/W1 | mdb -kw) to avoid space map thrashing and the resulting severe write performance hit. The problem itself seems to be old and understood, a fix has been rumored to be "in the works" for a while now, just as has been the defrag API which presumably should help as well, yet I could not find an "official" approach fixing it by default in production code.

Is there something I have missed?

Some environment data: my zpools are of moderate size (typically < 10 TB) and mostly present zfs datasets with zvols using the default record size of 8K (which is in fact variable due to typically enabled compression). Over the years, I have seen this problem appear in different versions of Solaris, especially with aged zpools which have seen a lot of data. Note that this is not the same as the zpool 90% full performance wall as the space map thrashing due to fragmentation hits at a significantly lower space utilization level (I have seen it occur at 70% on a couple of old pools)

the-wabbit
  • 40,737
  • 13
  • 111
  • 174

1 Answers1

1

Unfortunately, in a word: no.

In a longer word: sort of. The method by which ZFS finds free space to use has been somewhat altered in latest builds of ZFS (Open-ZFS) to somewhat mitigate the issue -- the underlying fragmentation remains, the 'fix' is that it has less impact on performance.

The only true 'fix' you can use at the moment is to zfs send the data off the pool, wipe the pool out, and zfs send the data back. Obviously the problem will then reappear at a later date, based on your workload and how quickly you fragment the space maps.

There are other potential fixes/workarounds being discussed/in the works, but I certainly couldn't give any sort of ETA.

Nex7
  • 1,995
  • 12
  • 14
  • Thank you. I know your involvement mainly is focusing on Nexenta and BSD ports of ZFS, I wonder if anyone happens to have a Solaris-specific answer (somewhat along the lines of "take a look at patchset XY which is fixing issue Z and modifying metaslab caching as follows: ...") – the-wabbit Nov 21 '13 at 20:21
  • Well, I focus on Open-ZFS (www.open-zfs.org) 'ports', and my day job is at Nexenta, but you are correct in that I want nothing to do with the beast that is Oracle and its no longer open source code. Given they won't show you the code, I'd be worried that they /have/ done things here, that either through on-disk format change or by virtue of your pool and workload stick you to them, removing your ability to move off Oracle Solaris if you some day need/want to. But enough proselytizing. Good luck. :) – Nex7 Nov 22 '13 at 20:25
  • @Andrew well, it is just data - I might be unable to zfs send / receive (in fact this has been broken in 11.1 anyway) but I still would be able to pull/extract a tar archive, which is fine for my purposes. FWIW, Oracle is indeed doing "stuff" to ZFS without a great deal of documentation visible to the outer world. They seem to silently have fixed the SATA-Interposer-Expander issue which has been haunting the Nexenta folks for some time (has it been found in the meanwhile?). This might or might not be a good thing. I suppose time will show. – the-wabbit Nov 22 '13 at 21:10
  • No, that's not 'solved' on illumos. Or on Linux. Honestly I'm not sure how Oracle could have fixed 'it' (it being a bad, singular-implying word for what is actually a multiple-issue problem). As long as you're comfortable with a high level (not zfs send|recv) method of file transfer, I guess it's OK. Some people have just too much data and too sensitive a downtime requirement for it, or insufficient budget to double up their storage before replacing the OS, etc. :) – Nex7 Nov 22 '13 at 22:12