Py4JError while converting csv file to parquet using jupyter-notebook

Question

I want to convert a csv to parquet file using jupyter notebook, python3. However, i get the next error:

Py4JJavaError                             Traceback (most recent call last)

Py4JJavaError: An error occurred while calling o40.parquet.
: org.apache.spark.SparkException: Job aborted.
        at ...…...…..
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
        at 

Caused by: org.apache.spark.SparkException: Task failed while writing rows.

Caused by: java.net.SocketException: Connection reset by peer: socket write error

How can I resolve it please?

You should supply the code that you are trying to run and the Python trace back if you have it (rather than the Java stack trace). — Chris, May 15 '20 at 07:12

score 0 · Answer 1 · answered May 15 '20 at 06:20

Make sure you have hadoop binaries available and HADOOP_HOME is set

If not download them from here

Then set HADOOP_HOME

import os
os.environ['HADOOP_HOME']=r"C:\hadoop-2.7.1"
os.environ["JAVA_HOME"] = r"C:\Program Files\Java\jdk1.8.0_212"

Then save the file

Py4JError while converting csv file to parquet using jupyter-notebook

1 Answers1