When I try to execute this command line at pyspark
arquivo = sc.textFile("dataset_analise_sentimento.csv")
I got the following error message:
Py4JJavaError: An error occurred while calling z:
org.apache.spark.api.python.PythonRDD.runJob.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver):
org.apache.spark.SparkException: Python worker failed to connect back.
I have tried the following steps:
- Check environment variables.
- Check Apache Spark installation on Windows 10 steps.
- Use different versions of Apache Spark (tried 2.4.3 / 2.4.2 / 2.3.4).
- Disable firewall windows and antivirus that I have installed.
- Tried to initialize the SparkContext manually with
sc = spark.sparkContext
(found this possible solution at this question here in Stackoverflow, didn´t work for me). - Tried to change the value of
PYSPARK_DRIVER_PYTHON
fromjupyter
toipython
, as said in this link, no success.
None of the steps above worked for me and I can´t find a solution.
Actually I´m using the following versions:
Python 3.7.3, Java JDK 11.0.6, Windows 10, Apache Spark 2.3.4