Improve the performance of ElasticSearch

Question

I am using ElasticSearch to index some data. But I found that the performance is not that efficiency.

There are only 3000 entries data and each data has 6 columns. It costs 5 mins to index these 3000 entries.

Because I am new with ElasticSearch, my code and program flow are basic as following:

search and check is there any same data with it.
if there is same data, then update.
If not, then add.

The code is following:

conn = pyes.ES('server:9200')

Search:

searchResult = conn.search(searchDict, indexName, TypeName)

Index

conn.index(storeDict, indexName, TypeName, id)

Update the Count in the index data.

 conn.partial_update(indexName, TypeName, id, "ctx._source.Count += counter", params={"counter" : 1})

Is there any method that can improve the performance of my code ?

Thank you for your help.

Could you make the title of the question a little more descriptive. It seems more about improving the way you use elasticsearch in your application, than improving its own performance. — javanna, Jul 26 '13 at 08:50

score 1 · Accepted Answer · answered Jul 25 '13 at 06:35

1

You don't need to search before updating. Read the es docs on updating and scroll down to the upsert section. upsert is a parameter which holds a document to use if the document does not exist on the server, otherwise the upsert is ignored and it works like a normal update request (as you are doing now).

Good luck!

answered Jul 25 '13 at 06:35

ramseykhalaf

3,371
2
17
16

HI, It's me again. I revised my code with upsert and it's better. It costs 3 mins to finish. Is there any method to let the time less than 1 min ? – Jimmy Lin Jul 25 '13 at 06:49
I change the config file in /bin/elasticsearch.in.sh, bit it's seems doesn't work even I restart the elasticsearch. How can I let elasticsearch read the new setting file ? – Jimmy Lin Jul 25 '13 at 07:15
I'm not so sure about the settings files sorry. If you want to get the index time even lower then don't use an update script. What I would experiment with (if you are incrementing one field in the update) is to calculate the result of the modification, create a new document in pyres, then just overwrite the old document. (Use the normal put api, as you are doing in step 3 of your question.) – ramseykhalaf Jul 25 '13 at 07:24
Also I forgot to mention, you should look at the `_bulk` [api from the es docs](http://www.elasticsearch.org/guide/reference/api/bulk/) – ramseykhalaf Jul 25 '13 at 07:40
1

It's unbelieveable, I use bulk size 400, and it cost only 2 secs to finish this job. – Jimmy Lin Jul 25 '13 at 10:12
@Jimmy use bulk size 400? you mean when you created the connection, you did conn = ES("server:9200", bulk_size=400)? I did that but the performance is the same. – B.Mr.W. Oct 14 '13 at 21:58

shyos · Answer 2 · 2013-09-06T06:23:30.593

1

You can use versioning feature of elasticsearch. If you are deciding your documents id's its pretty easy. It simply re-index the data.
You should use BULK API for indexing.(1000-5000 is good)
Another reason of bad performance is about configuration settings on config/elasticsearch.yml, you can use this hints to increase indexing performance.

edited Sep 06 '13 at 06:23

answered Jul 25 '13 at 10:23

shyos

1,390
1
16
29

Improve the performance of ElasticSearch

2 Answers2