2

Let's suppose we have a fairly large database of products for e.g. 50 K Mobile Phone. We store these data in Elastic Search. Now I have a Product Listing page for mobile phone, there I list all mobile 10 at a time using pagination with their basic details. That page also have a filter section like Brand, Price Range, RAM, Avg. Rating, Release date and lot more specifications.

Now When I fetch for mobile of Samsung company and 6 GB ram, I fire a elastic query and got results and their total count. So here bring the count query become complex, Total count depends on filter and this type of query increases load on system.

I want a system which will compute count for filters once and save it somewhere so I don't need to calculate count for same filter again, thus reducing complexity overhead for same filters again and again. Let me know with your knowledge How I can solve this problem or how should I maintain my system?

Any reference or article would also be appreciated.

Sanjay Soni
  • 201
  • 1
  • 13
  • count queries are always costly in all type of datasources. More than query optimization, can you explore other options as how important is this to show exact count of filters. Is showing something like 100+ or 1000+ or 10+ serves your use case – Abhishek Ranjan Feb 10 '22 at 07:04

3 Answers3

0

Elastic search provides its own caching techniques and you can you use those setting for caching a specific query

GET /my_index/_search?request_cache=true
{
  "size": 0,
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "colors"
      }
    }
  }
}

Here is a link for more details. Mind it that you have this much hardware configuration for caching of elasticsearch. https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html#_enabling_and_disabling_caching_per_request.

Also if you have multiple data nodes, and there are few indexes on whom queries and aggregation are performed frequently, and you have different hardware configs of different nodes, then you should look into elasticsearch hot and cold nodes concept, and put you imp indexes in hot nodes, instead of caching everything

https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management

gaurav9620
  • 1,147
  • 12
  • 30
0

could we add a cache layer and cache the most used filter count. and increment and decrement the count whenever the inventory is updated. this way we can avoid computation. which wont be possible in case of elasticsearch as it will invalidate the cache when the inventory is updated. this way we can avoid too many hits on elasticsearch.

-1

Could you please update above post with the sample of the complex count query you are using? It would be helpful to identify issue you are facing. Thanks!

Shubham
  • 51
  • 7
  • Think like a simple count query but it operates on large database like 50 K mobile phones data set. So I want to cache or precompute those count query data for numerous filter combinations. – Sanjay Soni Jun 21 '20 at 06:45
  • 50K documents in elasticsearch is an average size but not very large dataset. Are you running your query on a single index? – Shubham Jun 21 '20 at 07:15
  • Let's suppose I have sharded data into multiplse index, at a moment my application encounter 10K request, Now calculating count for each 10K request is big costly, My CPU consumption increase rapidly, how I can reduce this computation complexity for filters with scalable solutions. – Sanjay Soni Jun 21 '20 at 07:36
  • For optimal performance before you go to implement a 3rd party caching mechanism. Please take a look on Elasticsearch caching capabilities because it would be cost friendly and also will improve your performance. Try to keep as minimal number of components between application and elasticsearch. Also elasticsearch itself has improved a lot now and is being used by many large organizations to do real time query searches. You could check out these posts and documents: Field data cache: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-fielddata.html – Shubham Jun 21 '20 at 08:10
  • Shard request cache: https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html Node query cache: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-cache.html Also you'll need to run performance tests using jmeter or similar to identify whether any of these mechanism is improving performance. You could also read about the practices followed at Ebay: https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/ – Shubham Jun 21 '20 at 08:10
  • @ShubhamSingh - Kindly put these Type of suggestion in comments and not in answers. – gaurav9620 Jun 21 '20 at 08:29
  • @gaurav If I had enough points to comment I would have. – Shubham Jun 21 '20 at 08:32
  • @ShubhamSingh Thanks I will try inbuilt caching mechanism of ES – Sanjay Soni Jun 21 '20 at 09:41