Need to read data from ES 1.7 to index to 6.7. As there is no upgrade available. Need to index almost 5 TB data of 200 million records. We are using ES_REST_high_level_client(6.7.2) using the search and scroll approach. but not able to scroll using the scroll id. and another approach tried is using from and batch size. initially the read is faster as the from offset increase the read is really bad. what is the best approach to do.
1st Approach using search and scroll.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(10);
searchRequest.source(searchSourceBuilder);
searchRequest.scroll(TimeValue.timeValueMinutes(2));
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
while (run) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(TimeValue.timeValueSeconds(60));
SearchResponse searchScrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchScrollResponse.getScrollId();
hits = searchScrollResponse.getHits();
if (hits.getHits().length == 0) {
run = false;
}
}
Exception Exception in thread "main" ElasticsearchStatusException[Elasticsearch exception [type=exception, reason=ElasticsearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0]; ]] at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2050) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2026) :
2nd approach :
int offset = 0;
boolean run = true;
while (run) {
SearchRequest searchRequest = new SearchRequest("indexname");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(offset);
searchSourceBuilder.size(500);
searchRequest.source(searchSourceBuilder);
long start = System.currentTimeMillis();
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
long end = System.currentTimeMillis();
SearchHits hits = searchResponse.getHits();
System.out.println(" Total hits : " + hits.totalHits + " time : " + (end - start));
offset += 500;
if(hits.getHits().length == 0) {
run = false;
}
}
Any other approach to read data.