16

I am using .From() and .Size() methods to retrieve all documents from Elastic Search results.

Below is sample example -

ISearchResponse<dynamic> bResponse = ObjElasticClient.Search<dynamic>(s => s.From(0).Size(25000).Index("accounts").AllTypes().Query(Query));

Recently i came across scroll feature of Elastic Search. This looks better approach than From() and Size() methods specifically to fetch large data.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

I looking for example on Scroll feature in NEST API.

Can someone please provide NEST example?

Thanks, Sameer

Frederik Struck-Schøning
  • 12,981
  • 8
  • 59
  • 68
Sameer Deshmukh
  • 1,339
  • 4
  • 13
  • 20
  • There is a "new" scroll feature, which can help: https://stackoverflow.com/questions/55115517/how-to-get-all-documents-by-index-in-easticsearch-using-nest/55120639#55120639 – suchoss Nov 02 '20 at 11:25

4 Answers4

27

Here's an example of using scroll with NEST and C#. Works with 5.x and 6.x

public IEnumerable<T> GetAllDocumentsInIndex<T>(string indexName, string scrollTimeout = "2m", int scrollSize = 1000) where T : class
      {
          ISearchResponse<T> initialResponse = this.ElasticClient.Search<T>
              (scr => scr.Index(indexName)
                   .From(0)
                   .Take(scrollSize)
                   .MatchAll()
                   .Scroll(scrollTimeout));

          List<T> results = new List<T>();

          if (!initialResponse.IsValid || string.IsNullOrEmpty(initialResponse.ScrollId))
              throw new Exception(initialResponse.ServerError.Error.Reason);

          if (initialResponse.Documents.Any())
              results.AddRange(initialResponse.Documents);

          string scrollid = initialResponse.ScrollId;
          bool isScrollSetHasData = true;
          while (isScrollSetHasData)
          {
              ISearchResponse<T> loopingResponse = this.ElasticClient.Scroll<T>(scrollTimeout, scrollid);
              if (loopingResponse.IsValid)
              {
                  results.AddRange(loopingResponse.Documents);
                  scrollid = loopingResponse.ScrollId;
              }
              isScrollSetHasData = loopingResponse.Documents.Any();
          }

          this.ElasticClient.ClearScroll(new ClearScrollRequest(scrollid));
          return results;
      }

It's from: http://telegraphrepaircompany.com/elasticsearch-nest-scroll-api-c/

Kirill Rakhman
  • 42,195
  • 18
  • 124
  • 148
SomeGuy1989
  • 483
  • 4
  • 8
7

Internal implementation of NEST Reindex uses scroll to move documents from one index to another.

It should be good starting point.

Below you can find interesting for you code from github.

var page = 0;
var searchResult = this.CurrentClient.Search<T>(
    s => s
        .Index(fromIndex)
        .AllTypes()
        .From(0)
        .Size(size)
        .Query(this._reindexDescriptor._QuerySelector ?? (q=>q.MatchAll()))
        .SearchType(SearchType.Scan)
        .Scroll(scroll)
    );
if (searchResult.Total <= 0)
    throw new ReindexException(searchResult.ConnectionStatus, "index " + fromIndex + " has no documents!");
IBulkResponse indexResult = null;
do
{
    var result = searchResult;
    searchResult = this.CurrentClient.Scroll<T>(s => s
        .Scroll(scroll)
        .ScrollId(result.ScrollId)
    );
    if (searchResult.Documents.HasAny())
        indexResult = this.IndexSearchResults(searchResult, observer, toIndex, page);
    page++;
} while (searchResult.IsValid && indexResult != null && indexResult.IsValid && searchResult.Documents.HasAny());

Also you can take a look at integration test for Scroll

[Test]
public void SearchTypeScan()
{
    var scanResults = this.Client.Search<ElasticsearchProject>(s => s
        .From(0)
        .Size(1)
        .MatchAll()
        .Fields(f => f.Name)
        .SearchType(SearchType.Scan)
        .Scroll("2s")
    );
    Assert.True(scanResults.IsValid);
    Assert.False(scanResults.FieldSelections.Any());
    Assert.IsNotNullOrEmpty(scanResults.ScrollId);

    var results = this.Client.Scroll<ElasticsearchProject>(s=>s
        .Scroll("4s") 
        .ScrollId(scanResults.ScrollId)
    );
    var hitCount = results.Hits.Count();
    while (results.FieldSelections.Any())
    {
        Assert.True(results.IsValid);
        Assert.True(results.FieldSelections.Any());
        Assert.IsNotNullOrEmpty(results.ScrollId);
        var localResults = results;
        results = this.Client.Scroll<ElasticsearchProject>(s=>s
            .Scroll("4s")
            .ScrollId(localResults.ScrollId));
        hitCount += results.Hits.Count();
    }
    Assert.AreEqual(scanResults.Total, hitCount);
}
Frederik Struck-Schøning
  • 12,981
  • 8
  • 59
  • 68
Rob
  • 9,664
  • 3
  • 41
  • 43
  • Thank you. With aggregation query query type 'scan' not supported. So is it still good to use Scroll without search type 'scan' ? – Sameer Deshmukh Jul 10 '15 at 12:53
  • Yeah, but scroll will be less efficient https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll. Depends on your use case. – Rob Jul 10 '15 at 13:04
7

I took the liberty of rewriting the fine answer from Michael to async and a bit less verbose (v. 6.x Nest):

public async Task<IList<T>> RockAndScroll<T>(
    string indexName,
    string scrollTimeoutMinutes = "2m",
    int scrollPageSize = 1000
) where T : class
{
    var searchResponse = await this.ElasticClient.SearchAsync<T>(sd => sd
        .Index(indexName)
        .From(0)
        .Take(scrollPageSize)
        .MatchAll()
        .Scroll(scrollTimeoutMinutes));

    var results = new List<T>();

    while (true)
    {
        if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
            throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");

        if (!searchResponse.Documents.Any())
            break;

        results.AddRange(searchResponse.Documents);
        searchResponse = await ElasticClient.ScrollAsync<T>(scrollTimeoutMinutes, searchResponse.ScrollId);
    }

    await this.ElasticClient.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId));

    return results;
}
Frederik Struck-Schøning
  • 12,981
  • 8
  • 59
  • 68
  • A couple comments: in `scrollTimeoutMilliseconds = "2m"`, `2m` means minutes, not milliseconds; see [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units). Explanation of what exactly this parameter does is [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#scroll-search-context). You may also want to return `Task>` instead of `Task>` since the results are already in a List. – Tobias J Dec 18 '19 at 20:58
  • Ah, thanks for that correction, @TobiasJ. I have renamed the variable. As for ICollection it's probably even more descriptive returning an `IList`, which effectively is an ICollection that can be accessed by index. Corrected as well – Frederik Struck-Schøning Dec 19 '19 at 14:37
0

I took Frederick's answer and made it an extension method that leverages IAsyncEnumerable:

// Adopted from https://stackoverflow.com/a/56261657/1072030
public static async IAsyncEnumerable<T> ScrollAllAsync<T>(
    this IElasticClient elasticClient,
    string indexName,
    string scrollTimeoutMinutes = "2m",
    int scrollPageSize = 1000,
    [EnumeratorCancellation] CancellationToken ct = default
) where T : class
{
    var searchResponse = await elasticClient.SearchAsync<T>(
        sd => sd
            .Index(indexName)
            .From(0)
            .Take(scrollPageSize)
            .MatchAll()
            .Scroll(scrollTimeoutMinutes),
        ct);
    
    try
    {
        while (true)
        {
            if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
                throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");

            if (!searchResponse.Documents.Any())
                break;

            foreach(var item in searchResponse.Documents)
            {
                yield return item;
            }
        
            searchResponse = await elasticClient.ScrollAsync<T>(scrollTimeoutMinutes, searchResponse.ScrollId, ct: ct);
        }
    }
    finally
    {
        await elasticClient.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId), ct: ct);
    }
}

I'm guessing we should really be using search_after instead of the scroll api, but meh.

Chase Miller
  • 226
  • 1
  • 8