I know reservoir sampling can be applied in parallel, but spark seems use the other sampling methods I have no idea about. could someone describe them briefly?
According to @Tristan answer, I guess the purpose of not using reservoir sampling is to keep the balance of classes. But I go though the source code and found noting about labels.