what are the sampling methods in spark? Why not reservoir sampling?

Question

I know reservoir sampling can be applied in parallel, but spark seems use the other sampling methods I have no idea about. could someone describe them briefly?

According to @Tristan answer, I guess the purpose of not using reservoir sampling is to keep the balance of classes. But I go though the source code and found noting about labels.

score -1 · Answer 1 · answered May 25 '16 at 06:02

-1

I know the existence of Stratified sampling

answered May 25 '16 at 06:02

RoyaumeIX

1,947
4
13
37

You may also check out this link : https://databricks.com/blog/2014/08/27/statistics-functionality-in-spark.html – RoyaumeIX May 25 '16 at 06:04

what are the sampling methods in spark? Why not reservoir sampling?

1 Answers1