Elasticsearch: Pattern in high CPU usage after we started using elasticsearch high level client

Question

We started using elasticsearch high level client recently and we use scroll API to fetch large set of data from ES. We see a pattern in high CPU utilization as follows:

It's pattern repeating every 30 minutes. No clue what's going on. We see exception in elasticsearch too -

[2021-05-12T04:19:29,516][DEBUG][o.e.a.s.TransportSearchScrollAction] [node-2] [93486247] Failed to execute query phase org.elasticsearch.transport.RemoteTransportException: [node-3][10.160.86.222:7550][indices:data/read/search[phase/query/scroll]] Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [93486247] at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:496) ~[elasticsearch-6.8.9.jar:6.8.9] at org.elasticsearch.search.SearchService.runAsync(SearchService.java:373) ~[elasticsearch-6.8.9.jar:6.8.9] at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:435) ~[elasticsearch-6.8.9.jar:6.8.9] at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:376) ~[elasticsearch-6.8.9.jar:6.8.9] at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:373) ~[elasticsearch-6.8.9.jar:6.8.9] at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.9.jar:6.8.9]

The high level client code being used is the usual code given in the official documentation-

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
        SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
        searchRequest.scroll(scroll);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        if (StringUtils.isNotBlank(keyword)) {
            LOG.info("Searching for keyword: {}", keyword);
            boolQueryBuilder.must(QueryBuilders.multiMatchQuery(keyword, INDEXED_FIELDS));
        }

        if(StringUtils.isNotBlank(param1)) {
            boolQueryBuilder.filter(QueryBuilders.termQuery("param1", param1));
        }

        if(Objects.nonNull(param1)) {
            boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
        }

        if(Objects.nonNull(param1)) {
            boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
        }

        if(Objects.nonNull(param1)) {
            boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
        }

        if(Objects.nonNull(param1)) {
            boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
        }

        searchSourceBuilder.query(boolQueryBuilder);

        searchRequest.source(searchSourceBuilder);

        List<Object1> statuses = new ArrayList<>();
        try {
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            String scrollId = searchResponse.getScrollId();
            SearchHit[] searchHits = searchResponse.getHits().getHits();

            while (searchHits != null && searchHits.length > 0) {
                for (SearchHit hit : searchHits) {
                    Object1 agent = JsonUtil.parseJson(hit.getSourceAsString(),
                    Object1.class);
                    statuses.add(agent);
                }
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
                scrollRequest.scroll(scroll);   
                searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
                scrollId = searchResponse.getScrollId();
                searchHits = searchResponse.getHits().getHits();
            }

            ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
            clearScrollRequest.addScrollId(scrollId);
            ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
            boolean succeeded = clearScrollResponse.isSucceeded();

It looks like your scrolls are lasting too long and hitting the timeout. Can you post the relevant client code as well? — Val, May 12 '21 at 08:43
Added the code, ignore param1, it's actually different index field names, changed for official reasons — Mahesh H Viraktamath, May 12 '21 at 09:00
So your scroll timeout is 1 minute, but I guess each scroll takes longer than that which makes the scroll context expire and get garbage collected. So the next scroll has nothing to scroll over. You should maybe increase the timeout to a higher value. The CPU will not necessarily go down (as scroll queries are expensive), but at least you won't get the exception anymore. — Val, May 12 '21 at 09:07
But it doesn't explain the systematic pattern of the peaks every 30 minutes. — Mahesh H Viraktamath, May 12 '21 at 11:54
That doesn't but there were two different issues in your question. One was the exception which I explained how to fix by increasing the timeout (or decreasing the number of fetched documents). The second is the CPU peaks. When are you sending your scrolls? Are they during the same time frame as when those peaks occur? — Val, May 12 '21 at 12:32
I think I figured out, the example mentioned in the official doc has a mistake. It clears just the last scrollId, all the others created inside the while loop are open till the timeout. That was the cause of these peaks in our case. — Mahesh H Viraktamath, May 12 '21 at 16:27
Weird, because you are only supposed to clear the scroll when you're done scrolling. All iterations should just "carry forward" the last scroll context. You're not supposed to create and clear a new scroll for each iteration. — Val, May 12 '21 at 16:29
No, I am not creating scroll context inside the while loop, I am carrying the existing scroll context forward in each iteration. — Mahesh H Viraktamath, May 12 '21 at 16:33
That's what I'm saying, your code looks ok as you're carrying forward the last scroll id and only clearing it after the last iteration. — Val, May 12 '21 at 16:36
Actually, I also raised the scroll size to 2k, so I have to check our monitoring logs to see which change actually made the difference, I will update this space soon, thanks a lot @Val — Mahesh H Viraktamath, May 12 '21 at 16:36

Elasticsearch: Pattern in high CPU usage after we started using elasticsearch high level client

0 Answers0