I have some strange IO activity on a server and I can't figure our where it's coming from.
To provide some background, I had to replace an NVMe (Samsung PM81) from a server due to wear. I didn't notice any performance issues, but SMART reported it was time to get a replacement. I did notice some unusual IO activity on the device, but I thought maybe it was due to the device's wear and didn't think much of it.
Now, with a brand new NVMe (Samsung 980 Pro) and an OS installed from scratch (Debian 10), the IO activity issue still persist.
Here are the contents of /proc/diskstats
over a period of 1 minute:
$ cat /proc/diskstats; sleep 1m; cat /proc/diskstats
259 0 nvme0n1 2323590 271 213032732 285413 43708052 69809516 16770577066 269903507 0 901057472 1159862364 0 0 0 0
259 1 nvme0n1p1 2006 0 7264 3665 2 0 2 0 0 44 3080 0 0 0 0
259 2 nvme0n1p2 74879 0 5283682 9424 2001773 386508 28620456 971285 0 455348 825152 0 0 0 0
259 3 nvme0n1p3 2246597 271 207737634 272318 40382341 69423008 16741956608 266611966 0 12038708 266043996 0 0 0 0
259 0 nvme0n1 2323590 271 213032732 285413 43710868 69817259 16771166530 269907653 0 901114568 1159920624 0 0 0 0
259 1 nvme0n1p1 2006 0 7264 3665 2 0 2 0 0 44 3080 0 0 0 0
259 2 nvme0n1p2 74879 0 5283682 9424 2002019 386548 28623272 971330 0 455376 825180 0 0 0 0
259 3 nvme0n1p3 2246597 271 207737634 272318 40384852 69430711 16742543256 266615967 0 12041324 266047732 0 0 0 0
As you can see it reports nvme0n1
over 95 % of the time doing IO ((901114568-901057472)/60000*100)... but the IO usage on the partitions is next to nothing.
Where is the IO being done, then? On the partition table?
Also the time spent reading (0 ms) plus the time spend writing (4146 ms) does not add up to the time spent doing I/O (57096 ms).
What else is there to do but read and write?
There aren't any more partitions or unallocated space on the device:
$ echo p | sudo fdisk /dev/nvme0n1
Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): Disk /dev/nvme0n1: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 980 PRO 2TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0B698EB9-DD2E-4131-9730-4193DD9D5FB5
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 1953791 1951744 953M EFI System
/dev/nvme0n1p2 1953792 197265407 195311616 93.1G Linux filesystem
/dev/nvme0n1p3 197265408 3907028991 3709763584 1.7T Linux filesystem
Command (m for help):
SMART also reports an error, but if I understand it correctly it is simply reporting a missing feature on the device, and not a functional issue:
$ sudo smartctl -a /dev/nvme0n1
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-21-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 980 PRO 2TB
Serial Number: S69ENL0T610188X
Firmware Version: 5B2QGXA7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 6
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization: 1,736,883,855,360 [1.73 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 b621a0ae58
Local Time is: Tue Sep 27 10:47:54 2022 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.49W - - 0 0 0 0 0 0
1 + 4.48W - - 1 1 1 1 0 200
2 + 3.18W - - 2 2 2 2 0 1000
3 - 0.0400W - - 3 3 3 3 2000 1200
4 - 0.0050W - - 4 4 4 4 500 9500
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 214,200 [109 GB]
Data Units Written: 16,891,230 [8.64 TB]
Host Read Commands: 2,350,427
Host Write Commands: 42,643,472
Controller Busy Time: 238
Power Cycles: 1
Power On Hours: 262
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 40 Celsius
Temperature Sensor 2: 55 Celsius
Read Error Information Log failed: NVMe Status 0x02
I also checked iotop
, but I couldn't see anything relevant:
$ sudo iotop -aoPb -n 2 -d 60
unable to set locale, falling back to the default locale
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 0.00 B/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
Total DISK READ: 0.00 B/s | Total DISK WRITE: 36.82 K/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 46.88 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
649 be/3 root 0.00 B 84.00 K 0.00 % 0.07 % [jbd2/nvme0n1p3-]
396 be/3 root 0.00 B 40.00 K 0.00 % 0.05 % [jbd2/nvme0n1p2-]
31590 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/u48:0-flush-259:0]
4761 be/4 root 0.00 B 2.02 M 0.00 % 0.00 % minio server /data
733 be/4 root 0.00 B 12.00 K 0.00 % 0.00 % dcgm-exporter
737 be/4 root 0.00 B 8.00 K 0.00 % 0.00 % nscd
I guess this means the IO is being performed by the kernel itself?
Can anybody help me figure out what is causing this IO activity and how to avoid it? I wouldn't like this NVMe to wear out soon and need replacing again.