I'm pretty new to functional programming and don't have an imperative programming background. Running through some basic scala/spark tutorials online and having some difficulty submitting a Scala application through spark-submit.
In particular I'm getting a java.lang.ArrayIndexOutOfBounds 0 Exception, which I have researched and found out that the array element at position 0 is the culprit. Looking into it further, I saw that some basic debugging could tell me if the Main application was actually picking up the argument at runtime - which it was not. Here is the code:
import org.apache.spark.{SparkConf, SparkContext}
object SparkMeApp {
def main(args: Array[String]) {
try {
//program works fine if path to file is hardcoded
//val logfile = "C:\\Users\\garveyj\\Desktop\\NetSetup.log"
val logfile = args(0)
val conf = new SparkConf().setAppName("SparkMe Application").setMaster("local[*]")
val sc = new SparkContext(conf)
val logdata = sc.textFile(logfile, 2).cache()
val numFound = logdata.filter(line => line.contains("found")).count()
val numData = logdata.filter(line => line.contains("data")).count()
println("")
println("Lines with found: %s, Lines with data: %s".format(numFound, numData))
println("")
}
catch {
case aoub: ArrayIndexOutOfBoundsException => println(args.length)
}
}
}
To submit the application using spark-submit I use:
spark-submit --class SparkMeApp --master "local[*]" --jars target\scala-2.10\firstsparkapplication_2.10-1.0.jar NetSetup.log
...where NetSetup.log is in the same directory as where I'm submitting the application. The output of the application is simply: 0. If I remove the try/catch, the output is:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at SparkMeApp$.main(SparkMeApp.scala:12)
at SparkMeApp.main(SparkMeApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
It's worth pointing out that the application runs fine if I remove the argument and hard code the path to the log file. Don't really know what I'm missing here. Any direction would be appreciated. Thanks in advance!