2

I would like to retrieve all the documents from Elasticsearch, so I referred to the Search Scroll API.

But my question is, it is not returning all the documents, I have 36 documents in one index, for that it was returning 26 only.

Even when I checked with another index, where I have more than 10k documents, there it is also not returning the last 10 documents.

I really don't know why it was returning it like that! Any help will be appreciated! Thanks in advance!

Below the code I've tried:

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("myindex");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query("")//here some query;
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); 
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();

while (searchHits != null && searchHits.length > 0) { 
    
    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); 
    scrollRequest.scroll(scroll);
    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
    scrollId = searchResponse.getScrollId();
    searchHits = searchResponse.getHits().getHits();
    for (SearchHits hit: searchHits){
       String source=hit.getSourceAsString();
    }
}

ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); 
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();
Tom Slabbaert
  • 21,288
  • 10
  • 30
  • 43
tisispa1
  • 203
  • 2
  • 3
  • 16
  • Are you doing one request or more? The Scroll API does not return all documents in one request, instead you initialize a context with the scroll parameter in your first request and then do subsequent requests after that, passing the received scroll ID that identifies your context. You will get all the results throughout all these requests in batches. – zsltg Mar 25 '20 at 08:29
  • @Zsolt I followed above mentioned link code please look on it if anything i missed?? – tisispa1 Mar 25 '20 at 09:07
  • Can you include a code snippet in your question that shows how you do the requests? It is hard to tell what is missing without that. You need to do requests in a loop to get all the results as shown in the "Full example" at the end of the page you referenced. – zsltg Mar 25 '20 at 09:11
  • @Zsolt yes i referred full example – tisispa1 Mar 25 '20 at 09:20
  • I'm assuming you removed some of the code, looks like you are processing the results at the end of the while loop, is that correct? Please note that further up when you execute the first request before the loop, that should also return a set of results, do you process that too? If not, that might explain the missing results. – zsltg Mar 25 '20 at 11:19

2 Answers2

0

Today I faced with the same problem while working with an example from:

Elastic Scroll API

First of all, about documents you missed - 10 is default value for the size of requests and based on this we can suppose that one of your requests wasn't handled properly. In your code first batch of 10 documents isn't handled:

SearchHit[] searchHits = searchResponse.getHits().getHits();

Before while loop you should iterate over your searchHits . From the first time it was not clear to me in the official documents.

pavel_v
  • 1
  • 1
0

You should change your while loop logic to execute the hit iteration first and the scroll after.

while (searchHits != null && searchHits.length > 0) {

    // execute this block first otherwise the scroll will overwrite the initial hits.
    for (SearchHits hit: searchHits){
        String source=hit.getSourceAsString();
    }

    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
    scrollRequest.scroll(scroll);
    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
    scrollId = searchResponse.getScrollId();
    searchHits = searchResponse.getHits().getHits();
}

Another thing to consider is that you can increase the response hit size. from the docs:

The index.max_result_window which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size.

So the defaulted value for max_result_window is 10k hits, you can also set this value to be something else. this means you can fetch up to 10k hits in 1 search call instead of executing redundant paginations.

You can do this by specifying the size property for searchSourceBuilder before executing the search call like so:

searchSourceBuilder.size(10000); 
Tom Slabbaert
  • 21,288
  • 10
  • 30
  • 43