3

I want to aggregate N(= 10 million) data by percentile_rank. And I want to call this query lively (more than 100 times per second).

As far as I know, Elasticsearch uses t-digest algorithm on percentile. So total time complexity is O(NlogN).

My question is, does this time complexity applies on every single percentile_rank query? Is there any optimization like caching(if no PUT happens), periodical sort(maybe once an hour)?

If there is no optimization, can Elasticsearch handle our query scale effectively?

Amit
  • 30,756
  • 6
  • 57
  • 88
isbee
  • 161
  • 1
  • 8
  • Being able to call that query "100 times per second" means that ES should constantly respond in less than 10ms (including the network latency and your app processing time). I'm curious to know what is the use case behind this. Would you mind sharing a bit more? – Val Jan 13 '21 at 04:46
  • Thanks for your interest in my question. Many clients can call simultaneously some kind of `GetWinRateTopPercentageOfTotalUsers`. Then backend need to query data store(db, cache, elasticsearch, or whatever) and calcuclate a user's top x% of all users. I am investigating redis, elasticsearch as a data store to handle this problem. – isbee Jan 13 '21 at 05:09
  • Well, the response time you'll be able to achieve will of course depend on many different factors, among which the volume of data, the complexity of the query you're running, the hardware ES is running on, etc. – Val Jan 13 '21 at 05:12
  • I should take care things you mentioned of course, but i just want to know about elasticsearch scope. if `percentile_rank` is O(NlogN) and N > 10m, is elasticsearch able to handle it < 1 second? If can, what should be the hardware spec? (like you said) – isbee Jan 13 '21 at 05:20
  • No one will be able to tell you that as every use case is different. My take is that it's always possible to scale up to the point where you'll get your constraint satisfied. The easiest way to figure out is to test it out on your real dataset. – Val Jan 13 '21 at 05:23

0 Answers0