In my company we are currently using Spark interpreter to generate dynamically class files with spark-jobserver. Those class files are generated on our Spark cluster driver and saved into the directory (on that driver) defined by using "-Yrepl-outdir
" command from standard ScalaSettings
. It represents sort of cache for our executors which load the class files from there.
Everything works fine with standard setup, with one interpreter per driver, but the problem occurs when I tried to improve performances by introducing multiple interpreters running in parallel. I used Akka router design pattern with single interpreter per each routee, where each routee runs in its own thread, and of course I hit the wall. Namely, those interpreters are overriding results of each other inside of output directory on evaluating class files.
I've tried to fix it by adding different output directory for each interpreter, but in that case those output directories were not recognized by the Spark as directories to look for generated class files. For each particular interpreter I defined separate output directory by using "-Yrepl-outdir
" command by somehow it wasn't enough.
I was also trying to change class loader to modify default names of those generated packages/classes, having each one starting with some prefix unique for the certain interpreter, but I haven't found working solution yet.
Since for reproducing this issue you need to have Spark cluster instance running and programmatic setup of the Spark Scala interpreter I'll just expose simplified method to show our generation of Scala interpreter in general:
def addInterpreter(classpath: String, outputDir: File, loader: ClassLoader, conf: SparkConf): IMain = {
val settings = new Settings()
val writer = new java.io.StringWriter()
settings.usejavacp.value = true
settings.embeddedDefaults(loader)
settings.classpath.value = (classpath.distinct mkString java.io.File.pathSeparator).replace("file:", "")
SparkIMainServer.createInterpreter(conf, outputDir, settings, writer)
}
Here you can see some simplified output of my running interpreters with packages on the left-side panel and the content of the one of them ($line3
) on the right side. What I think would solve my problem is to give custom names to those packages - instead of $line1, $line2
, etc. something like p466234$line1, p198934$line2
, etc. with unique prefixes for each interpreter.
So, what's the easiest way to rename those class-files/packages generated by Spark Scala interpreter? Is there any other solution to this problem?