3

I can run "spark-shell" on my local PC. But I cannot have pyspark running on PC with the error attached with a log.

I also googled on many places, but that did not solve my problem. Any people with experience with PySpark could enlighten my path. Thank you in advance.

My Config:

  • Spark: 3.2.0
  • Java 17
  • Python 3.8.6
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/10/29 10:37:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/29 10:37:08 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:833)
C:\DS\spark\python\pyspark\shell.py:42: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "C:\DS\spark\python\pyspark\shell.py", line 38, in <module>
    spark = SparkSession._create_shell_session()  # type: ignore
  File "C:\DS\spark\python\pyspark\sql\session.py", line 553, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "C:\DS\spark\python\pyspark\sql\session.py", line 228, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "C:\DS\spark\python\pyspark\context.py", line 392, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "C:\DS\spark\python\pyspark\context.py", line 146, in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  File "C:\DS\spark\python\pyspark\context.py", line 209, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "C:\DS\spark\python\pyspark\context.py", line 329, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "C:\DS\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\java_gateway.py", line 1573, in __call__
    return_value = get_return_value(
  File "C:\DS\spark\python\lib\py4j-0.10.9.2-src.zip\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
        at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
        at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
        at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:833)


C:\DS\spark\bin>SUCCESS: The process with PID 29408 (child process of PID 19416) has been terminated.
SUCCESS: The process with PID 19416 (child process of PID 37944) has been terminated.
SUCCESS: The process with PID 37944 (child process of PID 23752) has been terminated.
Gabio
  • 9,126
  • 3
  • 12
  • 32
Trung Tran
  • 51
  • 1
  • 2

5 Answers5

4

I think you might be using the wrong version of Java. From the doc:

Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).

Try installing Java 11 instead of your current version.

Ismail H
  • 4,226
  • 2
  • 38
  • 61
  • Thank you very much. I installed Java 11 and pyspark is working. But spark-shell not working, occurs an error: Caused by: java.net.URISyntaxException: Illegal character in path at index 39: spark://[domain-address].com:28000/C:\classes – Trung Tran Oct 29 '21 at 15:08
  • what's the command you're lauching ? – Ismail H Oct 31 '21 at 09:16
  • is your issue resolved? I m facing the same issue – Arvind Pant Dec 18 '21 at 19:16
4

PySpark is using Py4J which requires:

--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED

And it requires:

--add-opens=java.base/sun.nio.ch=ALL-UNNAMED

So to have it work with Java 17 I'm creating the session using:

SparkSession.builder.appName("test").config(
    "spark.driver.extraJavaOptions",
    "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED",
).getOrCreate()

Hope this help

bloussou
  • 91
  • 4
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/31526851) – Dan Apr 16 '22 at 15:21
  • Thank Bloussou, this option solved my problem "java.lang.ExceptionInInitializerError: Exception java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x34f7cfd9) cannot access class sun.nio.ch.DirectBuffer" – alejomarchan Apr 22 '22 at 17:04
  • This worked for me on Windows 10 & Java 18, thanks. Was just trying to do something quick and dirty without using WSL :-) – Mark Simpson May 22 '22 at 22:38
  • @bloussou your answer seems to be helpful could you please help me with similar issue for Java11 on apache/spark-py docker image. I am facing a similar issue "py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext." and I'm using apache/spark-py docker image to run pyspark and I'm unable to resolve this issue. The java version apache/spark-py docker image uses is 11.0.16 can you please suggest what to use instead for javal 11 – inkarar Jan 20 '23 at 15:01
1

Yes, this is java version issue. I just installed OpenJDK 8 and uninstalled other versions of java and pyspark works fine now

I m using

  • spark - 3.2.0
  • OS - macOS Monterey
  • Java - 16.0.1
  • Python - 3.9

Check Install java versions

/usr/libexec/java_home --verbose

enter image description here

Uninstall other java versions

brew uninstall AdoptOpenJDK

enter image description here

Run pyspark and you will get below screen enter image description here

Arvind Pant
  • 401
  • 5
  • 9
0

From what I understand of this error, this is a compatibility issue. When installing Spark, you'll need to select "Pre-built for Apache Hadoop 2.7" as your package type:
https://spark.apache.org/downloads.html

Then use hadoop-2.7.7/bin winutils.exe for your hadoop folder:
https://github.com/cdarlint/winutils/tree/master/hadoop-2.7.7

Jedidiah
  • 33
  • 6
0

Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. https://spark.apache.org/docs/latest/