7

I'm trying to use the takeSample() function in Spark and the parameters are - data, number of samples to be taken and the seed. But I don't want to use the seed. I want to have a different answer everytime. I'm not able to figure out how I can do that. I tried using System.nanoTime as the seed value but it gave an error since I think the data type didn't match. Is there any other function similar to takeSample() that can be used without the seed? Or is there any other implementation I can use with takeSample() so that I get a different output every time.

Jorge
  • 191
  • 2
  • 8
Prateek Kulkarni
  • 481
  • 2
  • 5
  • 12

3 Answers3

8

System.nanoTime is of type long, the seed expected by takeSample is of type Int. Hence, takeSample(..., System.nanoTime.toInt) should work.

Malte Schwerhoff
  • 12,684
  • 4
  • 41
  • 71
1

System.nanoTime returns Long, whereas takeSample expects an Int.
You can feed scala.util.Random.nextInt as a seed value to the takeSample function.

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
1

As of Spark version 1.0.0, the seed parameter is optional. See https://issues.apache.org/jira/browse/SPARK-1438.

Josh Milthorpe
  • 956
  • 1
  • 14
  • 27