0

I m getting the following error when i m trying to run some sample command in spark session after getting into cli

i have install openjdk version 11 (openjdk 11 2018-09-25) , python latest version (Python 3.10.4) ,spark latest version ( 3.2.1) so just to make sure to check whether pyspark is working i run just a sample command eg

 rdd =sc.parallelize([1,2,3])
 rdd.first()

I have set the environment variables for java, python and spark but i m not sure whats happening wrong over here, Would appreciate anyone's help or some pointers though which I can debug it

when i run rdd.first() i get the following error

>>> rdd =sc.parallelize([1,2,3])
>>> rdd.first()
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
22/03/25 00:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:188)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:108)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:121)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:162)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: Accept timed out
        at java.base/java.net.PlainSocketImpl.waitForNewConnection(Native Method)
        at java.base/java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:163)
        at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
        at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:551)
        at java.base/java.net.ServerSocket.accept(ServerSocket.java:519)
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:175)
        ... 14 more
22/03/25 00:14:37 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.56.1 executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:188)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:108)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:121)
        at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:162)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: Accept timed out
kashif
  • 41
  • 2
  • please check if your pyspark version is same as your spark version. and set environment variable PYSPARK_PYTHON to python (assuming you have python already installed) – anky Mar 25 '22 at 06:19
  • @anky Can you tell me how to do it means how do i check whether pyspark version is same as spark version ? Yes i have install only java, python and spark and i have set the environment variable for java and spark and for python i choose the python installation setup to set the env variable while installing it . Plus i dont see anything like `PYSPARK_PYTHON` in system variable section or in user variable section. – kashif Mar 25 '22 at 07:06
  • so far i check from cmd it looks like both are same version pyspark and spark are same version `3.2.1` – kashif Mar 25 '22 at 07:15
  • 1
    The issue got fix after adding a variable like below add this below variable in `user variable section` dont add in `system variable section` ```variable name : PYSPARK_PYTHON variable value : python``` thanks @Ramineni Ravi Teja and anky – kashif Mar 25 '22 at 07:41

0 Answers0