2

I have an ammonite script which creates a spark context:

#!/usr/local/bin/amm

import ammonite.ops._

import $ivy.`org.apache.spark:spark-core_2.11:2.0.1`

import org.apache.spark.{SparkConf, SparkContext}

@main
def main(): Unit = {
  val sc =  new SparkContext(new SparkConf().setMaster("local[2]").setAppName("Demo"))
}

When I run this script, it throws an error:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: org.apache.spark.SparkException: Error while locating file spark-version-info.properties
...
Caused by: java.lang.NullPointerException
    at java.util.Properties$LineReader.readLine(Properties.java:434)
    at java.util.Properties.load0(Properties.java:353)

The script isn't being run from the spark installation directory and doesn't have any knowledge of it or the resources where this version information is packaged - it only knows about the ivy dependencies. So perhaps the issue is that this resource information isn't on the classpath in the ivy dependencies. I have seen other spark "standalone scripts" so I was hoping I could do the same here.


I poked around a bit to try and understand what was happening. I was hoping I could programmatically hack some build information into the system properties at runtime.

The source of the exception comes from package.scala in the spark library. The relevant bits of code are

val resourceStream = Thread.currentThread().getContextClassLoader.
  getResourceAsStream("spark-version-info.properties")

try {
  val unknownProp = "<unknown>"
  val props = new Properties()
  props.load(resourceStream) <--- causing a NPE?
  (
    props.getProperty("version", unknownProp),
    // Load some other properties
  )
} catch {
  case npe: NullPointerException =>
    throw new SparkException("Error while locating file spark-version-info.properties", npe)

It seems that the implicit assumption is that props.load will fail with a NPE if the version information can't be found in the resources. (That's not so clear to the reader!)

The NPE itself looks like it's coming from this code in java.util.Properties.java:

class LineReader {
    public LineReader(InputStream inStream) {
        this.inStream = inStream;
        inByteBuf = new byte[8192];
    }

    ...
    InputStream inStream;
    Reader reader;

    int readLine() throws IOException {
      ...
      inLimit = (inStream==null)?reader.read(inCharBuf)
                                :inStream.read(inByteBuf);

The LineReader is constructed with a null InputStream which the class internally interprets as meaning that the reader is non-null and should be used instead - but it's also null. (Is this kind of stuff really in the standard library? Seems very unsafe...)


From looking at the bin/spark-shell that comes with spark, it adds -Dscala.usejavacp=true when it launches spark-submit. Is this the right direction?

Thanks for your help!

rmin
  • 1,018
  • 1
  • 9
  • 18

1 Answers1

1

Following seems to work on 2.11 with 1.0.1 version but not experimental.

Could be just better implemented on Spark 2.2

#!/usr/local/bin/amm

import ammonite.ops._
import $ivy.`org.apache.spark:spark-core_2.11:2.2.0` 
import $ivy.`org.apache.spark:spark-sql_2.11:2.2.0`
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._ 
import org.apache.spark.sql.SparkSession

@main
def main(): Unit = {
    val sc =  new SparkContext(new SparkConf().setMaster("local[2]").setAppName("Demo"))
}

or more expanded answer:

@main
def main(): Unit = {
    val spark = SparkSession.builder()
      .appName("testings")
      .master("local")
      .config("configuration key", "configuration value")
      .getOrCreate 
  val sqlContext = spark.sqlContext
  val tdf2 = spark.read.option("delimiter", "|").option("header", true).csv("./tst.dat")
  tdf2.show()
}
user6273920
  • 713
  • 1
  • 7
  • 16
  • Thanks for the help. I ran your script but hit the dreaded: "NoSuchMethodError ...ArrayOps" which is usually a binary compatibility problem. That's using both scala 2.12.3 and scala 2.11.8 with java 1.8.0_25 and ammonite 1.0.1 (https://git.io/v7cx7). Are you able to list the versions of the tools you're using? I also tried deleting the `~/.ammonite/cache` but still get the same error. – rmin Aug 20 '17 at 22:51
  • ~$ scala Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144). ~$ amm Loading... Welcome to the Ammonite Repl 1.0.1 (Scala 2.11.11 Java 1.8.0_144) uname -r 4.4.0-91-generic Lubuntu 16.04 LTS 64 bit. I would say nuke: $ rm -rf ~/.coursier/cache/v1/https/oss.sonatype.org/content/repositories/snapshots if that don't work $mv ~/.ivy2 ~/.ivy2.bak – user6273920 Aug 23 '17 at 03:58
  • Thanks @quantt. I realised that the scala version defined by SCALA_HOME wasn't used by ammonite and it in fact uses its own version of scala. Also I realised that even the older ammonite installers seem to use the most recent version of ammonite, but older versions of scala. When I used amm version https://git.io/v5Tcm the script worked. My output is: "Welcome to the Ammonite Repl 1.0.2 (Scala 2.11.11 Java 1.8.0_25)" which is actually a newer ammonite to what I first tried, but with an older scala. So currently this is a solution for scala 2.11 but not 2.12. – rmin Aug 29 '17 at 08:48
  • :) 1st line of my line: Following seems to work on 2.11 with 1.0.1 version but not experimental. I should have said scala 2.11 – user6273920 Aug 29 '17 at 11:00