6

I am using elasticsearch 2.2.0 with default cluster configuration. I encounter a problem with scan and scroll query using spring data elasticsearch. When I execute the query I get error like this:

[2016-06-29 12:45:52,046][DEBUG][action.search.type       ] [Vector] [155597] Failed to execute query phase
RemoteTransportException[[Vector][10.132.47.95:9300][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [155597]];
Caused by: SearchContextMissingException[No search context found for id [155597]]
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:611)
    at org.elasticsearch.search.SearchService.executeScan(SearchService.java:311)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:433)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:430)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

My 'scan & scroll' code:

public List<T> getAllElements(SearchQuery searchQuery) {
    searchQuery.setPageable(new PageRequest(0, PAGE_SIZE));
    String scrollId = elasticsearchTemplate.scan(searchQuery, 1000, false);
    List<T> allElements = new LinkedList<>();
    boolean hasRecords = true;
    while (hasRecords) {
        Page<T> page = elasticsearchTemplate.scroll(scrollId, 5000, resultMapper);
        if (page.hasContent()) {
            allElements.addAll(page.getContent());
        } else {
            hasRecords = false;
        }
    }
    elasticsearchTemplate.clearScroll(scrollId);
    return allElements;
}

When my query result size is less than PAGE_SIZE parameter, then error like this occurs five times. I guess that it is one per shard. When result size is bigger than PAGE_SIZE, the error occurs few times more. I've tried to refactor my code to not call:

Page<T> page = elasticsearchTemplate.scroll(scrollId, 5000, resultMapper);

when I'm sure that the page has no content. But it works only if PAGE_SIZE is bigger than query result, so it is no the solution at all.

I have to add that it is problem only on elasticsearch side. On the client side the errors is hidden and in each case the query result is correct. Has anybody knows what causes this issue?

Thank you for help,

Simon.

esnosek
  • 113
  • 1
  • 1
  • 4

3 Answers3

4

I get this error if the ElasticSearch system closes the connection. Typically it's exactly what @Val said - dead connections. Things sometimes die in ES for no good reason - master node down, data node is too congested, bad performing queries, Kibana running at the same time you are in middle of querying...I've been hit by all of these at one time or another to get this error.

Suggestion: Up the initial connection time - 1000L might be too short for it to get what it needs. It won't hurt if the query ends sooner.

This also happens randomly when I try to pull too much data quickly; you might have huge documents and trying to pull PAGESIZE of 50,000 might be a little too much. We don't know what you chose for PAGESIZE.

Suggestion: Lower PAGESIZE to something like 500. Or 20. See if these smaller values slow down the errors.

I know I have less of these problems after moving to ES 2.3.3.

Antonio Ciolino
  • 546
  • 5
  • 16
3

This usually happens if your search context is not alive anymore.

In your case, you're starting your scan with a timeout of 1 second and then each scan is alive during 5 seconds. It's probably too low. The default duration to keep the search context alive is 1 minute, so you should probably increase it to 60 seconds like this:

String scrollId = elasticsearchTemplate.scan(searchQuery, 60000, false);
...
Page<T> page = elasticsearchTemplate.scroll(scrollId, 60000, resultMapper);
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thank you for answer, but I have tried it before and it does not work in my case. My sample query is very simple and it took less than one second, so it cannot be problem. – esnosek Jun 30 '16 at 13:43
  • If you want to be certain, you can enable DEBUG logs on `org.elasticsearch.search.SearchService` and if you see `freeing search context....` debug logs then this is the issue. – Val Jun 30 '16 at 13:46
  • Thank you for trying, but after enabling DEBUG logs on that class nothing has changed. The error logs are the same as before. – esnosek Jul 04 '16 at 12:08
  • Have you tried with increased timeouts like I suggested? It's worth giving it a shot and that will probably get rid of your issue. – Val Jul 22 '16 at 03:52
  • As I have written, I tried and it does not work. Thanks – esnosek Jul 26 '16 at 14:11
  • 1
    Oh I think I know what is happening. You're using the initial scrollId all along, for each scroll request you should be using the one you get in the preceding scroll response. – Val Jul 26 '16 at 14:16
  • I have the same exact issue using ElasticSearch 2.4 and my context duration is set to 60000 (1m). I am using spring data elasticsearch, and scrolling just like the OP. I think your last comment might be on to something but I don't see a way of getting new scroll ID from previous scroll request. Is there any way to do that with spring data elasticsearch? – Sikor Dec 14 '16 at 16:53
  • @Sikor there is an [open PR](https://github.com/spring-projects/spring-data-elasticsearch/pull/120) that will support this, it's not merged yet. – Val Dec 15 '16 at 05:36
  • I used that exact code above (except I have 5000 as 5000L), and have no problems in ES. scrollId is updated in the hasNext() code inside the ElasticSearchTemplate – Antonio Ciolino Jan 17 '17 at 18:10
3

I ran into a similar problem and I suspect that Spring Data Elasticsearch has some internal bug about passing the Scroll-ID. In my case I just tried to scroll through the whole index and I can rule out @Val his answer about "This usually happens if your search context is not alive anymore", because the exceptions occurred regardless of the duration. Also the exceptions started after the first Page and occurred for every other paging query.

In my case I could simply use elasticsearchTemplate.stream(). It uses Scroll & Scan internally and seems to pass the Scroll-ID correctly. Oh, and it's simpler to use:

SearchQuery searchQuery = new NativeSearchQueryBuilder()
    .withQuery(QueryBuilders.matchAllQuery())
    .withPageable(new PageRequest(0, 10000))
    .build();

Iterator<Post> postIterator = elasticsearchTemplate.stream(searchQuery, Post.class);

while(postIterator.hasNext()) {
    Post post = postIterator.next();
}
ss1
  • 1,009
  • 15
  • 32