2

I am facing the error while running the following PySpark Program. Using

OS Windows 10

Java version 8

Spark version 2.4.0

Python version 3.6

CODE:

from pyspark.context import SparkContext
sc = SparkContext.getOrCreate()
textFile= sc.textFile(r"file.txt")
textFile.count()

ERROR:

 ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    <ipython-input-7-99998e5c7b17> in <module>()
    ----> 1 textFile.count()    
    Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 4, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
        at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)...

Many have same problem but they solved by changing java version to 8 but i am using java version 8 even thought getting error

Any help appreciated.

Thanks.

Nusrath
  • 499
  • 1
  • 4
  • 16
  • Any updates on this? Even I am facing this error with Java 8 release 202. I am using Python 3.7 running on Windows 10. – Indrajit Feb 08 '19 at 15:00
  • Was able to solve this by going back to Spark 2.3 as per this post: https://stackoverflow.com/questions/53252181/python-worker-failed-to-connect-back – Indrajit Feb 08 '19 at 15:17

0 Answers0