1

I'm trying to get a pyspark script running remotely on AWS EMR, following the instructions provided by AWS. However, when I try to submit my script, I am getting the following exception:

Traceback (most recent call last):
  File "/home/aco/src/test_remote_pyspark.py", line 19, in <module>
    spark = SparkSession.builder.config(conf=conf).getOrCreate()
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/pyspark/context.py", line 288, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/py4j/java_gateway.py", line 1525, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/aco/.local/share/virtualenvs/prototypes-dS3RdFhP/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
    at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:160)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
    ... 20 more

I have SPARK_HOME set to the directory where I uncompressed Spark, and I can see that it does contain some Jars:

$ echo $SPARK_HOME
/home/aco/Downloads/spark-2.4.0-bin-hadoop2.7
$ ls $SPARK_HOME/jars | head
activation-1.1.1.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr4-runtime-4.7.jar
antlr-runtime-3.4.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.4.0-b34.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
apache-log4j-extras-1.2.17.jar

I don't really have any clue how to debug this. Any ideas?

aco
  • 719
  • 2
  • 9
  • 26
  • 1
    Have you tried add this http://repo1.maven.org/maven2/com/sun/jersey/jersey-bundle/1.19.4/jersey-bundle-1.19.4.jar file to your `$SPARK_HOME/jars`? – Ali AzG May 12 '19 at 08:20
  • I had the same and tried --packages com.sun.jersey:jersey-bundle:1.19.4, but it didn't help – Artem Trunov Dec 12 '19 at 16:38

0 Answers0