0

I'm wondering if there are any important differences between using SBT console and Spark-shell for interactively developing new code for a Spark project (notebooks are not really an option w/ the server firewalls).

  • Both can import project dependencies, but for me SBT is a little more convenient. SBT automatically brings in all the dependencies in build.sbt and spark-shell can use the --jar, --packages, and --repositories arguments in the command line.
  • SBT has the handy initialCommands setting to automatically run lines at startup. I use this for initializing the SparkContext.

Are there others?

andrew
  • 3,929
  • 1
  • 25
  • 38

2 Answers2

1

With SBT you need not install SPARK itself theoretically.

I use databricks.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
1

From my experience sbt calls external jars innately spark shell calls series of imports and contexts innately. I prefer spark shell because it follows the standard you need to adhere to when build the spark submit session.

For running the code in production you need to build the code into jars, calling them via spark submit. To build that you need to package it via sbt (compilation check) and run the spark submit submit call (logic check).

You can develope using either tool but you should code as if you did not have the advantages of sbt (calling the jars) and spark shell (calling the imports and contexts) because spark submit doesn't do either.

afeldman
  • 492
  • 2
  • 10