Avoid timeout in Elasticsearch re-indexing in Java

Question

Below code returned a timeout in client (Elasticsearch Client) when number of records are higher.

CompletableFuture<BulkByScrollResponse> future = new CompletableFuture<>();
client.reindexAsync(request, RequestOptions.DEFAULT, new ActionListener<BulkByScrollResponse>() {
@Override
public void onResponse(BulkByScrollResponse bulkByScrollResponse) {
    future.complete(bulkByScrollResponse);
}

@Override
public void onFailure(Exception e) {
    future.completeExceptionally(e);
}
});
BulkByScrollResponse response = future.get(10, TimeUnit.MINUTES); // client timeout occured before this timeout

Below is the client config.

connectTimeout: 60000
socketTimeout: 600000
maxRetryTimeoutMillis: 600000

Is there a way to wait indefinitely until the re-indexing complete?

No. `Caused by: java.net.SocketTimeoutException: 600000 milliseconds timeout on connection http-outgoing-2 [ACTIVE] ... 11 common frames omitted` — Ruchira Gayan Ranaweera, Dec 30 '20 at 01:36

score 1 · Answer 1 · answered Apr 21 '22 at 04:08

submit the reindex request as a task:

TaskSubmissionResponse task = esClient.submitReindexTask(reindex, RequestOptions.DEFAULT);

acquire the task id:

TaskId taskId = new TaskId(task.getTask());

then check the task status periodically:

        GetTaskRequest taskQuery = new GetTaskRequest(taskId.getNodeId(), taskId.getId());
        GetTaskResponse taskStatus;
        do {
            Thread.sleep(TimeUnit.MINUTES.toMillis(1));
            taskStatus = esClient.tasks()
                    .get(taskQuery, RequestOptions.DEFAULT)
                    .orElseThrow(() -> new IllegalStateException("Reindex task not found. id=" + taskId));
        } while (!taskStatus.isCompleted());

Elasticsearch java api doc about task handling just sucks.

Ref

score -1 · Answer 2 · answered Dec 30 '20 at 04:15

-1

I don't think its a better choice to wait indefinitely to complete the re-indexing process and give very high value for timeout as this is not a proper fix and will cause more harm than good.

Instead you should examine the response, add more debugging logging to find the root-cause and address them. Also please have a look at my tips to improve re-indexing speed, which should fix some of your underlying issues.

answered Dec 30 '20 at 04:15

Amit

30,756
6
57
88

2

Thanks for your answer. Already improvements has been done as per the re-indexing API(increase timeout added slicing). Now the timeout is the only concern. When number of records are very high re-indexing throwing a timeout exception. Changing the timeout, time to time is not an good option to have. That is why either I need to `wait for re-indexing completion` or `retry by increasing timeout on the fly`. But I couldn't find good reference to retry upon timeout. – Ruchira Gayan Ranaweera Dec 30 '20 at 08:10
@RuchiraGayanRanaweera makes sense, but apart from reindex API there are a ton of improvement which you can do to improve the performance, please go through my short tips and let me know what all have you implemented. – Amit Dec 31 '20 at 09:45

Avoid timeout in Elasticsearch re-indexing in Java

2 Answers2