I am running zeppelin 0.7.0 on an emr-5.4.0 cluster. I am starting the cluster with the default settings. The %spark.dep
interpreter doesn't get configured by EMR.
I have edited the file /etc/zeppelin/conf/interpreter.json
from the below:
"2ANGGHHMQ": {
"id": "2ANGGHHMQ",
"name": "spark",
"group": "spark",
"properties": {
"spark.yarn.jar": "",
"zeppelin.spark.printREPLOutput": "true",
"master": "yarn-client",
"zeppelin.spark.maxResult": "1000",
"spark.app.name": "Zeppelin",
"zeppelin.spark.useHiveContext": "true",
"args": "",
"spark.home": "/usr/lib/spark",
"zeppelin.spark.concurrentSQL": "false",
"zeppelin.spark.importImplicit": "true",
"zeppelin.pyspark.python": "python",
"zeppelin.dep.localrepo":"/usr/lib/zeppelin/local-repo"
},
"interpreterGroup": [
{
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"name": "spark"
},
{
"class": "org.apache.zeppelin.spark.PySparkInterpreter",
"name": "pyspark"
},
{
"class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
"name": "sql"
}
],
"option": {
"remote": true,
"port": -1,
"perNoteSession": false,
"perNoteProcess": false,
"isExistingProcess": false
}
}
I have to manually add the following and restart zeppelin:
{
"class":"org.apache.zeppelin.spark.DepInterpreter",
"name": "dep"
}
Is there a way to make EMR use the default zeppelin settings (and not remove this config)?
UPDATE
Could someone also explain why the cluster I have just created this morning, by cloning the original cluster, has a completely different config?
"interpreterGroup": [
{
"name": "spark",
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "scala",
"editOnDblClick": false
}
},
{
"name": "pyspark",
"class": "org.apache.zeppelin.spark.PySparkInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "python",
"editOnDblClick": false
}
},
{
"name": "sql",
"class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "sql",
"editOnDblClick": false
}
}
]