0

I understand that for both the conventional and the weighted reservoir sampling algorithms, the user has to specify the size of the reservoir as the input. Is there any reservoir sampling algorithm that takes as the input only a uniform sampling ratio for the items, meaning that the user does not know the size of stream apriori as well as the resulting sample size? I have looked around but with no luck.

Thanks for any help!!

rabbit686
  • 79
  • 5

1 Answers1

1

If you know neither the size of the population nor the desired sample size, the only possible streaming algorithm is to select each element wirh probability p. That won't guarantee that the selected sample will have exactly pN elements, but it will be unbiased and approximately the right size.

Having said that, I think it is very rare to have a use case which requires a sample of x% of an unkniwn population. Much more common is that the size of the sample is fixed by the cost of processing ( or storing), in which case reservoir sampling will fill in the desired sample size regardless of population size.

rici
  • 234,347
  • 28
  • 237
  • 341