1

I have an AWS EC2 i3.8xlarge with 4x1.9TB ephemeral storage volumes tied in mdadm RAID-0 (Yes, RAID 0) with XFS file system mounted using noatime and inode64 options. I am running an IO intensive application and the IO performance is poor. It was running better on EBS storage with out RAID. When I run iostats with -xz options, I could see high avgqu-sz and 100% utilization all the time on /dev/md0 device. I have tried different "tuned" profiles but nothing helped much. Am I missing something? Is this normal performance on Ephemeral storage?

The fio and dd disk benchmarking tools predict very high performance even on RAID but it doesn't seem to be the case in real time. Charting the avgqu-size over time gives a "mountain" shaped graph with peak of values around 4 million. mdadm created RAID with 512k chunk size. FYI, all the options were defaults.

iostat -xz 2
Linux 3.10.0-1160.21.1.el7.x86_64 05/04/2021 _x86_64_ (32 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
27.00 0.00 6.83 0.04 0.00 66.13

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 41.26 2.56 125.23 102.84 3891.36 62.51 0.65 5.08 0.77 5.17 0.07 0.95
xvdf 0.00 24.74 0.04 19.35 0.56 233.91 24.20 0.02 0.85 0.70 0.85 0.55 1.06
nvme0n1 1159.18 779.48 179.49 144.75 5478.32 3692.28 56.57 0.08 0.25 0.33 0.15 0.10 3.23
nvme3n1 1153.67 785.26 179.04 144.76 5454.39 3715.40 56.64 0.10 0.31 0.42 0.17 0.10 3.21
nvme1n1 1156.57 782.38 179.58 144.04 5468.08 3701.01 56.67 0.08 0.25 0.33 0.15 0.10 3.21
nvme2n1 1158.06 780.71 178.81 145.42 5471.01 3699.84 56.57 0.08 0.25 0.33 0.15 0.10 3.22
md0 0.00 0.00 32.43 270.12 1470.69 8902.15 68.57 2.34 0.00 0.00 0.00 3.31 100.00
dm-0 0.00 0.00 0.01 0.27 0.34 3.39 26.33 0.00 1.17 0.73 1.19 0.12 0.00
dm-1 0.00 0.00 0.00 0.34 0.13 2.02 12.36 0.00 0.78 0.95 0.78 0.13 0.00
dm-2 0.00 0.00 0.02 43.46 0.09 228.45 10.51 0.03 0.70 0.64 0.71 0.24 1.05
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 19.45 0.00 0.41 0.51 0.29 0.28 0.00
dm-4 0.00 0.00 0.00 0.01 0.01 0.05 9.82 0.00 0.31 0.65 0.28 0.13 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
38.51 0.00 9.96 0.03 0.00 51.49

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdf 0.00 40.00 0.00 30.00 0.00 324.00 21.60 0.02 0.57 0.00 0.57 0.57 1.70
nvme0n1 0.50 88.50 2.00 18.00 10.00 422.50 43.25 0.00 0.03 0.00 0.03 0.18 0.35
nvme3n1 0.50 88.50 1.00 18.00 6.00 422.50 45.11 0.00 0.05 0.00 0.06 0.16 0.30
nvme1n1 43.00 0.50 14.50 3.00 232.00 10.50 27.71 0.00 0.06 0.07 0.00 0.20 0.35
nvme2n1 43.00 0.50 13.50 3.00 240.00 10.50 30.36 0.00 0.06 0.07 0.00 0.18 0.30
md0 0.00 0.00 4.00 13.50 32.00 420.00 51.66 1913712.04 0.00 0.00 0.00 57.14 100.00
dm-2 0.00 0.00 0.00 70.00 0.00 324.00 9.26 0.04 0.64 0.00 0.64 0.23 1.60

avg-cpu: %user %nice %system %iowait %steal %idle
8.82 0.00 3.90 0.03 0.00 87.24

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 1.00 0.00 278.50 0.00 1534.00 11.02 0.23 0.81 0.00 0.81 0.04 1.10
xvdf 0.00 30.00 0.00 22.50 0.00 244.00 21.69 0.01 0.51 0.00 0.51 0.51 1.15
nvme0n1 0.00 7.00 0.00 2.50 0.00 36.25 29.00 0.00 0.20 0.00 0.20 0.60 0.15
nvme3n1 0.00 123.00 0.00 14.50 0.00 548.25 75.62 0.00 0.17 0.00 0.17 0.17 0.25
nvme1n1 6.00 57.00 2.00 8.50 32.00 260.25 55.67 0.00 0.10 0.00 0.12 0.24 0.25
nvme2n1 6.00 59.00 2.00 6.50 32.00 260.25 68.76 0.00 0.29 0.25 0.31 0.29 0.25
md0 0.00 0.00 0.00 17.00 0.00 544.00 64.00 1913729.56 0.00 0.00 0.00 58.82 100.00
dm-2 0.00 0.00 0.00 52.50 0.00 244.00 9.30 0.03 0.65 0.00 0.65 0.23 1.20

avg-cpu: %user %nice %system %iowait %steal %idle
4.26 0.00 2.85 0.03 0.00 92.86

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.50 0.00 2.00 8.00 0.00 1.00 0.00 1.00 1.00 0.05
xvdf 0.00 34.00 0.00 25.50 0.00 274.00 21.49 0.02 0.63 0.00 0.63 0.63 1.60
nvme0n1 91.50 49.50 37.00 8.50 514.00 230.25 32.71 0.01 0.32 0.39 0.00 0.11 0.50
nvme3n1 49.50 98.50 16.00 19.00 268.00 468.25 42.07 0.01 0.43 0.78 0.13 0.14 0.50
nvme1n1 42.00 61.50 22.00 14.00 256.00 300.25 30.90 0.03 0.93 1.50 0.04 0.12 0.45
nvme2n1 0.00 159.50 0.00 27.00 0.00 744.25 55.13 0.00 0.04 0.00 0.04 0.19 0.50
md0 0.00 0.00 1.00 19.50 10.00 944.00 93.07 1913800.23 0.00 0.00 0.00 48.78 100.00
dm-2 0.00 0.00 0.00 59.50 0.00 274.00 9.21 0.04 0.71 0.00 0.71 0.27 1.60
SVH
  • 11
  • 2
  • That's an $1800 a month server. Suggest you spend the $100 or so to get AWS business support for a month, their support is generally excellent and should be able to help you fairly quickly. – Tim May 05 '21 at 08:24
  • Well, I did that but I am looking for expert comments here as well. Thanks for your response though. – SVH May 05 '21 at 17:35
  • Good, AWS should be able to sort you out, and they will generally spend as much time is required to solve a problem. Hopefully someone here can help as well. – Tim May 05 '21 at 19:36
  • When asking a question that involves `fio` please post your command line and/or job file (it helps passerbys!) – Anon May 06 '21 at 19:15

0 Answers0