We performed a kafka benchmark (BM) in order to figure out the maximum throughput (TP) available with the given kafka brokers and disks.
kafka brokers setup (machine spec & disks):
3 kafka brokers, Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 8 cores.
each broker has sdb device mounted to /var/kafka, in size 14.6T.
the sdb device is combined of 16 SAS disks ~1TB in RAID-10. which means 8 disks are used as parity.
kafka producer configuration:
key=string, value=byteArray
enable.auto.commit=false
buffer.memory=500000000
batch.size=262144
retry.backoff.ms=5
linger.ms=20000
retries=0
compression.type=lz4
acks=1
kafka topic configuration
100 partitions, balanced between all 3 brokers
replication factor = 3
how the kafka BM was performed
we injected messages using a proprietary KakkaInjector tool messages.
the messages were in size ~1K and were sent into all 100 partitions (equally) for consecutive 2.5 hours.
the BM goal was to see what's the maximal TP that can be achieved without reaching more than ~80%-85% IO utilization%.
kafka BM results (throughput and IO utilization%)
so with ~85% IO utilization in all 3 brokers, the rate of messages/sec was 550,000 msgs being read & 550,000 msgs being written.
If we look at the TP in kB measures, then all 3 brokers reached tota of 380 rKB/s and 495 wKB/s.
my questions
these results were achieved with 3 kafka brokers X 16 SAS disks X 1TB. we want to reach ~1.5M messages/sec instead of the current rate of 550K msgs/sec.
so my question is:
is adding more disks to each broker will increase linearly the number of msgs being read and written?
is adding more brokers with the same disks setup will increase linearly the number of msgs being read and written?
if we change the RAID from RAID-10 to RAID-0, will the TP increase by 2X?
if we change the disks from SAS to SSD will it increase the TP?