we're using kafka brokers with:
24 HT cores
256GB RAM
16 X 7.2K RPM NLSAS disks configured in RAID-10 and replication factor 3.
the sar -q command the runq-sz reached ~40 on average, load average reached 10 and the CPU was only 10% us and 5% sys.
that didn't explain the high value of runq-sz, so we ran sar -d and saw following:
- the tps of the device is ~650
- %util is ~30%
- rd_sec is ~60K
- wr_sec is ~200K.
- there's a linear correlation between the tps and the %util.
I've got two questions:
1) Is there a way of decreasing the tps?
2) if there's such a way - will it decrease the %util?