I'm trying to use the takeSample()
function in Spark and the parameters are - data, number of samples to be taken and the seed. But I don't want to use the seed. I want to have a different answer everytime. I'm not able to figure out how I can do that. I tried using System.nanoTime
as the seed value but it gave an error since I think the data type didn't match. Is there any other function similar to takeSample()
that can be used without the seed? Or is there any other implementation I can use with takeSample()
so that I get a different output every time.
Asked
Active
Viewed 4,908 times
7

Jorge
- 191
- 2
- 8

Prateek Kulkarni
- 481
- 2
- 5
- 12
3 Answers
8
System.nanoTime
is of type long
, the seed expected by takeSample
is of type Int
. Hence, takeSample(..., System.nanoTime.toInt)
should work.

Malte Schwerhoff
- 12,684
- 4
- 41
- 71
-
1In scala `.toInt` should be prefered over `.intValue` – Régis Jean-Gilles Feb 04 '13 at 14:13
1
System.nanoTime
returns Long, whereas takeSample expects an Int.
You can feed scala.util.Random.nextInt
as a seed value to the takeSample function.

om-nom-nom
- 62,329
- 13
- 183
- 228
1
As of Spark version 1.0.0, the seed
parameter is optional. See https://issues.apache.org/jira/browse/SPARK-1438.

Josh Milthorpe
- 956
- 1
- 14
- 27