3

We are experiencing "pulsing" writes to disk (from 1 writes out/sec pulsing up to 142+ writes out/sec) around every 10 seconds.

See this example image: https://discussions.apple.com/servlet/JiveServlet/showImage/2-22394173-269851/Screen+Shot+2013-07-03+at+13.22.28.png

We dug into these "pulsing" writes and found that they happen exactly the same time as these errors from IOTOP:

dtrace: error on enabled probe ID 5 (ID 992: io:mach_kernel:buf_strategy:start): illegal operation in action #3 at DIF offset 0

The "pulsing" only happens when the error above presents itself in IOTOP.

Note: we are running Apple RAID software mirroring for two drives.

Any suggestions, help and tips will be greatly appreciated. Thanks in advance.

Charles
  • 50,943
  • 13
  • 104
  • 142
Ted Wilmont
  • 463
  • 2
  • 9
  • 20

1 Answers1

3

The pulsing I/O pattern you're seeing is characteristic of applications where many/most filesystem writes are asynchronous - this is because the filesystem will batch up the writes so it can do many at the same time to avoid doing one disk seek per write. The most common example I can think of is a database writing data - except for the database's write-ahead log, everything is typically written asynchronously; other transactional access patterns tend to be similar because they have a write-ahead log to recover if some async writes are lost in a crash. This is a common access pattern and isn't necessarily a problem, but it can become a problem when your disk is highly fragmented and the filesystem can't write everything out in batches (causing many seeks, just like it was trying to avoid).

The DTrace/iotop error you're seeing means there's either a bug in the DTrace implementation itself or in the iotop DTrace script. Looking at iotop's source code (in /usr/bin/iotop on OS X), there are three io:::start callbacks which could be the culprit. It's possible that there's some sort of null pointer access in the script for some types of I/O, but it doesn't look likely based on the script and the arguments io:::start probes take. Perhaps this is best resolved with a bug report to Apple.

Dan
  • 7,155
  • 2
  • 29
  • 54
  • Thank you very much for your informative comment. I've looked into it a bit deeper and it does seem that it's MySQL causing these heavy pulsing writes. When a table has an entry added or updated, does it have to re-write the entire table to disk or could it append/modify existing data? We are using MyISAM tables. We are also using SELECT queries with ORDER BY and TEXT fields, which of course require a write to temporary disk table, however they wouldn't be polled, they would be written as the request comes in. Any help in diagnosing this is greatly appreciated. – Ted Wilmont Aug 12 '13 at 16:20
  • OK so digging even deeper, I've found the EXACT cause of the problem. It seems that launchd in OS X is writing to disk (perhaps disk journaling in OS X) with this: `17:55:09.274758 IOCTL /dev/disk2 0.073689 W launchd.263*` – Ted Wilmont Aug 12 '13 at 16:56