Py4JJavaError when calling collect() method on rdd in PySpark

Question

I'm new to PySpark/Spark and using a text file contains just 5 lines of palin text for practicing. Below is the code:

text_rdd = sc.textFile(file_path)
text_rdd.collect() # This collect() works fine and showing the data
text_rdd.flatMap(lambda x: x.split(" ")).collect() #This collect() throwing below error


 Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 3) (Satish executor driver): java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
    at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:167)

You arent trying to repeat [this](https://stackoverflow.com/questions/73821909/pyspark-python-issue-py4j-protocol-py4jjavaerror-an-error-occurred-while-calli) question do you? How is it different then? — mazaneicha, Sep 25 '22 at 15:40
I didn't understand that answer. Could you please elaborate more? — SDE, Sep 25 '22 at 17:51

Py4JJavaError when calling collect() method on rdd in PySpark

0 Answers0