0

Im getting what appears a lot of wait time when performing large inserts in Mysql. Am I correct in assuming I have a disk I/O problem? Yes I am running Mysql on a VM.

Innodb_buffer_pool hit rate is %100
innodb_flush_method = ''
innodb_log_file_size = 250M
innodb_flush_log_at_trx_commit=1

$iostat -dx 10
Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00  2514.09  0.10 1483.22     0.80 31958.44    21.55    39.04   26.00   0.64  94.91
xvda1             0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvda2             0.00  2514.09  0.10 1483.22     0.80 31958.44    21.55    39.04   26.00   0.64  94.91
dm-0              0.00     0.00  0.10 3983.82     0.80 31870.53     8.00   163.34   39.84   0.24  94.91
dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00  2206.21  0.00 1336.24     0.00 28342.74    21.21    41.85   31.43   0.72  96.02
xvda1             0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvda2             0.00  2206.21  0.00 1336.24     0.00 28342.74    21.21    41.85   31.43   0.72  96.02
dm-0              0.00     0.00  0.00 3542.44     0.00 28339.54     8.00   165.48   47.09   0.27  96.02
dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

$ top
top - 11:57:19 up 88 days, 11:38,  3 users,  load average: 1.44, 1.54, 1.60
Tasks:  80 total,   2 running,  78 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  1.0%sy,  0.0%ni, 44.5%id, 53.2%wa,  0.0%hi,  0.0%si,  0.7%st
Mem:  16777216k total, 13791264k used,  2985952k free,   166988k buffers
Swap:  2129912k total,       44k used,  2129868k free,  7157368k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3910 mysql     15   0 13.2g 5.6g 6760 S  1.7 34.8   4:50.09 mysqld
    1 root      15   0 10364  740  620 S  0.0  0.0   0:00.29 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:02.77 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.08 events/0
    6 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
    7 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
    9 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 xenwatch
   10 root      10  -5     0    0    0 S  0.0  0.0   0:01.95 xenbus
   15 root      10  -5     0    0    0 S  0.0  0.0   0:00.04 kblockd/0
   16 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 cqueue/0
   20 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 khubd
   22 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
   84 root      15   0     0    0    0 S  0.0  0.0   0:00.01 khungtaskd
   87 root      10  -5     0    0    0 S  0.0  0.0   1:37.10 kswapd0
   88 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  218 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  239 root      17  -5     0    0    0 S  0.0  0.0   0:00.00 kstriped
  248 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ksnapd
  259 root      10  -5     0    0    0 S  0.0  0.0   1:56.45 kjournald
  281 root      10  -5     0    0    0 S  0.0  0.0   0:01.04 kauditd
  309 root      16  -4 12632  744  408 S  0.0  0.0   0:00.37 udevd
  636 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kmpathd/0
  637 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kmpath_handlerd
  656 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kjournald
EEAA
  • 109,363
  • 18
  • 175
  • 245
Matt
  • 1
  • 1
  • 1

3 Answers3

2

To answer it simply, yes.

The main indicator is the 53.2% wait time on the CPU. If your I/O wait percentage is greater than (1/# of CPU cores) then your CPUs are waiting a significant amount of time for the disk subsystem to catch up.

Inserts create disk write I/O, and that's generally the worst kind for virtual disks with a hard disk drive subsystem. HDDs are notoriously slower at writing than reading. This gap is lessened with SSDs, but write time is still slower than read time. Thus, any write operations are going to create more of a negative impact than read operations would.

Also, when you mix a large database and/or database that is under heavy load, with a disk sybsystem that's not designed for heavy I/O and already experiencing a large amount of I/O (like the disk sub-system of a VM), it multiplies the issue and creates major disk I/O issues.

Short of moving the server to a dedicated box with a disk sub-system that's optimized for SQL (which would be your long-term solution), your best bet is to take off any unnecessary load from the VM (like a web server), and just run the bare bones and SQL on it. Also, make sure you are tuning your indices and queries, and try and run the majority of the insert queries in off-peak hours, if possible.

tacotuesday
  • 1,389
  • 1
  • 16
  • 27
1

Yes you are correct, you've got a disk I/O problem. BTW, why are you using innodb_flush_log_at_trx_commit=1? It does not work well in heavy load environments, using innodb_flush_log_at_trx_commit=2 is recommended. But I don't think it will help you because you have large inserts so I assume transaction rate is not too high (not hundreds per second). So you should consider improving your disk subsystem by adding more spindles somehow.

Alex
  • 7,939
  • 6
  • 38
  • 52
  • Thanks for the sanity check Alex. You are correct, we do not have a high transaction rate (yet). I would assume setting innodb_flush_method=O_DIRECT wouldn't help much either? – Matt Aug 07 '12 at 22:54
  • It can help if you have a SAN or a RAID controller with BBU and write cache enabled but I don't think you do based on your numbers. In other situations performance can even degrade with O_DIRECT. – Alex Aug 07 '12 at 23:09
0

Just to be a contrarian, I have to disagree with some of the analysis. First and foremost, I think that the iowait time can be extremely misleading. All it's really saying is there isn't anything else going on while the CPU is free. In other words, a low iowait doesn't mean the disk isn't busy or the system isn't waiting, it just means it has better things to do than hang around twiddling its thumbs while waiting for io to complete.

Personally I find the wait and service times in the iostat output much more informative even though I'm not a fan of the output format as I find it much harder to read than collectl's output. In any event, the wait times in that output are only on the order of 20-40 msec which really aren't all that bad.

-mark

Mark J Seger
  • 161
  • 1
  • 6