0

I am trying to migrate (copy) 35 million documents (which is a standard amount, not too big) between couchbase to elasticsearch.

My elasticsearch (version 1.3) cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS Severs on Microsoft Azure (each server equals to a large server on Amazon)..

I used "timing data flow" indexing to store the docuemnts. each index represents a month and composed by 3 shards and 2 replicas.

when i start the migration script i see that the insertion time is becoming very slow (about 10 documents per second) and the load average of each server in the cluster jumping over than 1.5. In addition, the JVM memory is being increased almost to 100% while the cpu shows 20% and the IOps shows 20 at max. (i used Marvel CNC to get all these data)

  1. Does anyone faced these kind of indexing problems in elasticsearch?
  2. I would like to know if there are any parameters that i should be aware about to extend java memory?
  3. is my cluster specifications good enough to handle 100 indexing per second.
  4. is the indexing time depends on how big is the index? and should it be that slow?

Thnx Niv

John Y
  • 14,123
  • 2
  • 48
  • 72
Niv Penso
  • 333
  • 3
  • 17

1 Answers1

0

I am quoting an answer I got in google group (link)

A couple of suggestions:

  1. Disable replicas before large amounts of inserts (set replica count to 0), and only enable it afterwards again.

  2. Use batching, actual batch size would depends on many factors (doc sizes, network, instances strengths)

  3. Follow ES's advice on node setup, e.g. allocate 50% of the available memory size to the Java heap of ES, don't run anything else on that machine, and disable swappiness.

  4. Your index is already sharded, try spreading it out to 3 different servers instead of having them on one server ("virtual shards"). This will help fan out the indexing load.

  5. If you don't specify the document IDs yourself, make sure you use the latest ES, there's a significant improvement there in the ID generation mechanism which could help speeding up things.

I applied points 1 & 3 and it seems that the problems solved :) now i am indexing in rate of 80 docs per second and the load avg is low (0.7 at max)

I have to give the credit to Itamar Syn-Hershko that posted this reply.

Niv Penso
  • 333
  • 3
  • 17