We have regularly high load average. With the help of a zabbix agent, I could narrow it down to be mainly because of cpu iowait (and lastly disk io). I am not allowed to install any additional packages on the server but have root rights and want to investigate the issue. By now I know the affected partitions.
There are no tools like iostat, iotop, sar, etc. available. So I was looking around if there is, like so often, a (pseudo-)file that contains the info that is read by those tools. Since we use RAID, I first looked into /proc/mdstat
which gives the mapping of our devices /dev/mdX
to the disk partitions. Then I looked into /proc/diskstat
and with the help of https://www.kernel.org/doc/html/latest/admin-guide/iostats.html I could find the partitions that are affected by the most IO.
How can I nail it down to files or processes from here? Can lsof
be helpful? This is available.