I have an input file that is custom delimited and is passed to newAPIHadoopFile to convert as RDD[String]. The file resides under the project resource directory. The following code works well when run from the Eclipse IDE.
val path = this.getClass()
.getClassLoader()
.getResource(fileName)
.toURI().toString()
val conf = new org.apache.hadoop.conf.Configuration()
conf.set("textinputformat.record.delimiter", recordDelimiter)
return sc.newAPIHadoopFile(
path,
classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat],
classOf[org.apache.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text],
conf)
.map(_._2.toString)
However when I run it on spark-submit (with a uber jar) as follows
spark-submit /Users/anon/Documents/myUber.jar
I get the below error.
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/Users/anon/Documents/myUber.jar!/myhome-data.json
Any inputs please?