1

I'm playing around with Livy/Spark and am a little confused on how to use some of it. There's an example in the livy examples folder of building jobs that get uploaded to spark. I like the interfaces that are being used, but I want to interface to livy/spark via http as I don't have a java client. With that it seems that if I use the livyclient to upload jars it only exists within that spark session. Is there a way to upload livyjobs to spark and then have that be persistent across all of spark? Would it be better to make those jobs/apps in spark instead?

Honestly I'm trying to figure out what the best approach would be. I want to be able to do interactive things via the shell, but I also want to make custom jobs for algorithms not available in spark that I would use frequently. I'm not sure what way I should tackle this. Any thoughts? How should I be using Livy? Just as the rest service to spark and then handle building custom apps/methods in spark?

eg:

Say I have some javascript application, and I have some data I can load, and I want to run algorithm x on it. algorithm x is or isn't implemented in spark, but by pressing that button I want to get that data into spark, be it getting put into hdfs or pulled from elasticsearch or whatever. If I have livy I'd want to call some rest command in livy to do that and it then runs that particular algorithm. What's the standard way of doing this?

Thanks

Exuro
  • 229
  • 3
  • 15

3 Answers3

2

Livy doesn't support file uploads, yet. You have to provide valid file paths for sessions or batch jobs. These files have to be in HDFS. So, mainly you can keep your scripts or files in HDFS and then use Livy to launch a Batch/Interactive job referencing those files.

Livy - Cloudera

Livy - Apache

Edit: Livy is being incubated by Apache and they are planning to add a new API to support resource uploading. Check this.

Anis Tissaoui
  • 834
  • 1
  • 7
  • 26
1

The below api can be used to upload the jars once when your application start.

LivyClient client = new LivyClientBuilder(false).setURI(uri).setAll(config).build();
client.addJar(new URI(UPLOAD_JAR_PATH)).get();

LivyClient instance can be in application scope. UPLOAD_JAR_PATH : HDFS path where the jars are present and accessible by Livy Server

Then use the same LivyClient instance to submit multiple jobs.

client.submit(job).get();
dassum
  • 4,727
  • 2
  • 25
  • 38
0

You can use start a session with

spark.jars = "hdfs:///some/hdfs/location/file.jar"

so you can add as much boilerplate code as you like to any session.

niid
  • 473
  • 4
  • 13