Why bulk update never conflicts with update-by-query requests in Elasticsearch

Question

I keep two scripts running, one sending bulk requests to index:

while true; do
    s=$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10)
    curl -s -X POST 'localhost:9200/test/_bulk' -H 'Content-Type: application/x-ndjson' -d \
    '{ "update": { "_index": "test", "_id": "1" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
    { "update": { "_index": "test", "_id": "2" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
    { "update": { "_index": "test", "_id": "3" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
'
    echo ''
done

And another sending update-by-query requests on these documents (I have to sleep after each request since it may conflict with the previous one if requests sent too frequently):

while true; do
    s=$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10)
    curl -s -X POST 'localhost:9200/test/_update_by_query' -H 'Content-Type: application/json' -d \
'{
    "query": {
        "match": {
            "name": {
                "query": "update"
            }
        }
    },
    "script": {
        "lang": "painless",
        "source": "ctx._source['"'foo'"'] = '"'$s'"'"
    }
}'
    echo ''
    sleep 1
done

From the output of two scripts, there's no conflict failure in bulk response. All conflicts happened on the update-by-query side.

According to the conflict error message: version conflict, required seqNo [66], primary term [1]. current document has seqNo [67] and primary term [1], seems that the conflict happens when the operation is being copied from primary shard to replica. But bulk also need to do that and increase seqNo, right?

Is there any possibility that update-by-query succeeds but bulk conflicts and fails sometimes?

score 2 · Accepted Answer · answered Sep 30 '21 at 06:56

2

Your bulk requests always use the index command and so override the document (if any) or create a new document, so there can never be a conflict.

The update-by-query requests are... well, updates, and the conflicts can only happen on this side.

If your update request comes after the a bulk request has overridden an existing document, you get a conflict.

If your bulk request comes after the update request has updated a document, nothing happens because the bulk request will override the changes made by the update request since it uses the index command.

answered Sep 30 '21 at 06:56

Val

207,596
13
358
360

oh I see, index request never conflicts. What if I do update in bulk request instead of index? It still never conflict. Is my concurrency too low (only one script of bulk requests running)? – cosimoth Sep 30 '21 at 07:08
Now I see that you changed `index` to `update` in your bulk request... Can you explain the semantics of your use case? – Val Sep 30 '21 at 07:26
For example, I add a new field in my ES schema. I need to populate the field value for existing documents in an offline job with update-by-query (re-index is too heavy). Meanwhile my online business service also keeps indexing, deleting and and updating documents with bulk requests. I think there would theoretically be conflicts on the bulk request side, right? – cosimoth Sep 30 '21 at 08:32
it depends on whether your bulk requests does `index` or `update`. In your initial question you had `index` and then changed it to `update`, which has a different semantics. – Val Sep 30 '21 at 09:03

Why bulk update never conflicts with update-by-query requests in Elasticsearch

1 Answers1