0

I am using NEST client to get data from elastic search. As I have already 300000 data there request are so long. I followed this answer

My method looks like:

 public static async Task<IList<T>> RockAndScroll<T>(this IElasticClient client,
            string indexName,
            string scrollTimeoutMinutes = "2m",
            int scrollPageSize = 1000
        ) where T : class
        {
            var searchResponse = await client.SearchAsync<T>(sd => sd
                .Index(indexName)
                .From(0)
                .Take(scrollPageSize)
                .MatchAll()
                .Scroll(scrollTimeoutMinutes));

            var results = new List<T>();

            while (true)
            {
                if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
                    throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");

                if (!searchResponse.Documents.Any())
                    break;

                results.AddRange(searchResponse.Documents);
                searchResponse = await client.ScrollAsync<T>(scrollTimeoutMinutes, searchResponse.ScrollId);
            }

            await client.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId));

            return results;
        }

Filtering so slow as well:

var data = await _client.RockAndScroll<ElasticRequest>("taikunrequests");
            var queryable = data.AsQueryable();

            queryable = TaikunRequestListExtension.RoleManagement(request, _userAccessor.GetCurrentRole(), currentOrganization, partnerOrgs, queryable);
            queryable = TaikunRequestListExtension.Sort(request, queryable);
            queryable = TaikunRequestListExtension.Search(request, queryable);
            queryable = TaikunRequestListExtension.Range(request, queryable);
            queryable = TaikunRequestListExtension.Filter(request, queryable);

            var count = queryable.Count();

            var result = queryable
                .Skip(request.Offset ?? 0).Take(request.Limit ?? 50).ToList();

            return new TaikunRequestList(result, count);

If I use from and size from nest client it is max 1000, and 10000 for search after, and that was reason I used with scroll api as I need to filter option inside all data. Is it any way to improve performance of requests or using another solution?

Arzu Suleymanov
  • 671
  • 2
  • 11
  • 33

1 Answers1

0

Is it any way to improve performance of requests or using another solution?

Yes. The current code is reading every single document (MatchAll) and then querying them in-memory (AsQueryable). This approach is going to have horrible performance for real-world datasets.

What you should do is build a NEST query and send that to ElasticSearch. That way ElasticSearch does the actual query/filter and only returns the matching documents.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • I have only one index and all data inside it and it is already more than 300k, that is the reason I am using MatchAll, do I need to be specific and show exact index name? Anyway it will be again 300k data. – Arzu Suleymanov Aug 10 '21 at 08:07
  • @ArzuSuleymanov: If you're always reading 300k documents into memory, then it will be slow. The only way to make it faster is to reduce the number of documents you're reading. – Stephen Cleary Aug 10 '21 at 11:26