1

I'm using Java 1.7 and the ES bulk API with the java client to load json documents to and embedded Elasticsearch 1.7.1 instance.

Over the many hours this takes to run, I notice a gradual degradation in the rate of documents loaded per minute. I'm doing this on linux with spinning media, not an SSD. I'm not sure whether this is expected behaviour as the underlying Lucene indices grow and later inserts spend more time seeking to find where to insert. Or whether it may be something else.

Has anyone else observed this ?

Laurel
  • 5,965
  • 14
  • 31
  • 57
Phil
  • 141
  • 2
  • 7
  • Did you change any settings related to refreshing interval, or related to throttling indexing/merging? – Andrei Stefan Nov 05 '15 at 15:56
  • Yes, I set the following (I'm running on one node with no cluster) `"number_of_shards": 1, "number_of_replicas": 0, "refresh_interval": "120s", "indices.store.throttle.max_bytes_per_sec" : "40mb", "index.translog.flush_threshold_size":"1g" ` – Phil Nov 05 '15 at 17:19
  • and index.merge.scheduler.max_thread_count : 1 – Phil Nov 05 '15 at 17:30
  • I'm wondering whether this is related to merging. The observed load rate decay appears to be logarithmic – Phil Nov 05 '15 at 17:31
  • Why so few shards? How many CPU cores does your machine have (the one the ES node runs on)? Are you doing any other operations on the node, except `bulk`? – Andrei Stefan Nov 06 '15 at 06:31
  • Few shards because I'd presumed loading was faster with one than the default 5 based on what I'd read, and also on a small test set. Right now , our tests are all about indexing speed. Search speed during indexing isn't important. It's a 4 core machine. That node also hosts the embedded ES index. So I have one jvm running the loader that sends docs via the bulk api to the other jvm running the es index. At the moment we're not adding other nodes to the ES cluster. We're just keeping it small to test. Thanks ! – Phil Nov 06 '15 at 11:20
  • Depending on how many shards are being used in the cluster, overall, I'd suggest increasing the number of primaries to 4 and perform some more tests. – Andrei Stefan Nov 06 '15 at 12:31
  • I tried increasing the shard count to 4, but the degradation in load rate was similar and overall it took about 10% longer to load the whole doc set. – Phil Nov 09 '15 at 08:54

0 Answers0