Currently I have to update a field in over 1 million documents indexed in elasticsearch. This is a complex task due to this field contains metadata generated from XML files, evaluating xpath expressions. We have to loop over all the documents in the index and update this field. So, in order to avoid overkill the system, we decide to use the ironworker platform.
I have read several post about how to update millions of docs in elasticsearch, like this one, but given that we are gonna use ironworkers there are some restrictions, like a task can only run for 60 minutes.
Question:
How I loop over all the documents and update its fields, considering the restriction of 60 min.
I thought opening and scroll and pass the scroll_id to the next worker, but I don't have an idea of how long will take to execute the next task, so the scroll could expire and I will have to start all over.