I am getting a 1% Bernoulli random sample from an athena table. However, the size of the sample table returned is only 0.4% of the original table. Both are in parquet format. Why is that?
Asked
Active
Viewed 448 times
1 Answers
1
Then Bernoulli option will select rows with the given probability, only on average will you get the given percentage of output rows but any individual query will have a varying number of rows. As a rule of thumb, if your table has N
rows, you can expect the output table to have a number of rows between N +/- sqrt(N)

Nicolas Busca
- 1,100
- 7
- 14