2

I'm building a web app which have some realtime machine learning functionality with Flask. I want to use Spark Mllib to analyze data and give me the result within the app in realtime. Then I found Livy which I thought might be suitable for my project. I read the documentation of Livy and I understood that I could send code snippet to spark cluster by Livy like this

data = {'code': textwrap.dedent("""
val NUM_SAMPLES = 100000;
val count = sc.parallelize(1 to NUM_SAMPLES).map { i =>
  val x = Math.random();
  val y = Math.random();
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _);
println(\"Pi is roughly \" + 4.0 * count / NUM_SAMPLES)
""")}

My situation is that I have a huge amount of data from the backend of my app(thousands of lines of json formatted data)that I want to analyze with Spark. My question is how can I also pass the data to Spark with Livy? I can't find any working example with large dataset.

zoejyl
  • 21
  • 2

0 Answers0