I want to estimate the quantile of some data. The data is so huge that it won't fit in memory. And new data keeps coming in. Does anyone know an algorithm to monitor the quantile(s) of the data observed so far with very limited memory and computation? I find P2 algorithm useful. But it does not work very well for my data, which is extremely heavy-tailed distributed.
Asked
Active
Viewed 301 times
4
-
1You mention that your data is extremely heavy-tailed in its distribution. Naturally, the more we know about the data, the better we are able to tune an algorithm to the problem at hand. Is there anything else you can say about your data? – Richard Dec 08 '12 at 16:44
-
Also, are you looking to estimate rather low quantiles or high quantiles? And do you want an exact solution, or will an approximation do? – mitchus Jan 12 '13 at 10:01
-
1Have you tried transforming the data (e.g. arctan) so as to diminish the influence of outliers? Then you can backtransform any quantile estimate... – Quartz Oct 16 '13 at 15:23
1 Answers
0
look into dividing the value space into bins, each bin containing the counts of values in a range.
You can try to make the bins smaller around the point where you expect the looked-for quantile to be.
If You make the number of bins large enough this should work quite well.

maniek
- 7,087
- 2
- 20
- 43