0

When I run one instance of Elasticsearch, I can index at ~6,000 EPS. On the same server, I start another instance of Elasticsearch, join it to the cluster, and my index speed increases to ~10,000. In other words, a single instance of Elasticsearch does NOT utilize all of the CPU or disk IO that the server has available. Even when running two, not all resources are utilized. It seems like there is some sort of throttling somewhere and I'd like to change it. The primary use of this node will be indexing.

Single ES on server: ~6000 EPS

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.45    0.00    3.87    6.26    0.00   60.43

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00   733.13    0.00  800.60     0.00     6.48    16.59     1.75    2.19    0.00    2.19   0.89  71.22

Dual ES on server: ~10,0000 EPS

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          52.87    0.00    5.22    5.41    0.00   36.49

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00  1076.40    0.00  989.40     0.00     9.75    20.18     2.15    2.17    0.00    2.17   0.89  88.32

Maybe useful notes:

  • Both ES instances are stock ES installs with the only change being to increase the JVM size.
  • I have TBs of logs that look like the below:

{"timestamp":"1541290120","computername":"somenamehere","type":"server","owner":"somenamehere"}

  • Disks are SSDs in software raid0. A FIO 512B write test shows IOPS=46.4k, BW=22.7MiB/s and for 4k, IOPS=46.1k, BW=180MiB/s.
  • I use Logstash to process the files from a file and send to ES. The docID is created within Logstash. Stock tar.gz logstash yml config excluding config for xpack monitoring.
  • I provide a mapping up front but it isn't static.
  • System swap is turned off.
  • Index refresh_interval is 90s.
  • number_of_replicas is set to 0.
  • _node stats shows total_indexing_buffer": 1062404096
  • index rate is according to Kibana xpack monitoring
  • Elasticsearch 6.4.2 and Logstash 6.4.2

Is there a limiter somewhere that I need to change?

1 Answers1

0

first, please have a look on my yesterdays answer, where i explain how indexing works: ElasticSearch - How does sharding affect indexing performance?.

have you seen this (https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html) and this (https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html) already?

i don't know the shard count for your index, but lowering the number of primary shards can improve your indexing speed.

ibexit
  • 3,465
  • 1
  • 11
  • 25
  • thank you for the response.Could you please connect the dots for me on this one? Your response tells me to change my sharding to increase indexing speed, and based on your links and your other answers, this all sounds very reasonable. However, I'm really trying to understand why running two ES nodes on the same server doubled my indexing speed. I assume that if I change my shards from 5 to 1, indexing speed will increase and if I run two ES nodes on the same server, I wonder if it'll nearly double as presented in my original question. – helpzmepleasekthxbye Nov 09 '18 at 22:35
  • you generate the id in logstash, the distribution of the docs across the shards can be not optimal which leads to write waits. less shards, better distribution. cosider using es build in id generation - if you need to have a special id, just incroduce a legacy_id in you doc. have you tried to reduce shards count? what´s the outcome? can you share your es config? please provide your cluster settings too. – ibexit Nov 10 '18 at 20:31
  • and what is your cpu setup in this particular host? have you any custom plugins enabled in your es? – ibexit Nov 10 '18 at 20:48