"Tire bulk api" is taking more time than a river to finish indexing data

Question

Previously I was using a jdbc river to index all the data from mysql to elasticsearch. Now I have shifted to the tire bulk api, as it gives me the freedom to manipulate the data before indexing it into elasticsearch. But the indexing process using the tire bulk api takes lot of time(4 times) as compared to the jdbc river for 3M records. Is there a way to make the indexing process quicker and efficient?

score 0 · Accepted Answer · answered Nov 09 '12 at 06:33

0

IMHO, the key is that the JDBC river is launched inside Elasticsearch. So after a JDBC request, data are in memory and directly sent to ES.

With an external process, you have one network Hop more.

That said, 4 time lower is perhaps too much.

answered Nov 09 '12 at 06:33

dadoonet

14,109
3
42
49

Not neccessarily "too much": depends on which HTTP client is used (keep-alive), it goes over network vs. Java API, etc. – karmi Nov 10 '12 at 08:57

"Tire bulk api" is taking more time than a river to finish indexing data

1 Answers1