0

I am getting a 1% Bernoulli random sample from an athena table. However, the size of the sample table returned is only 0.4% of the original table. Both are in parquet format. Why is that?

lsl__
  • 75
  • 3
  • 12

1 Answers1

1

Then Bernoulli option will select rows with the given probability, only on average will you get the given percentage of output rows but any individual query will have a varying number of rows. As a rule of thumb, if your table has N rows, you can expect the output table to have a number of rows between N +/- sqrt(N)

Nicolas Busca
  • 1,100
  • 7
  • 14