pyspark change big pandas dataframe to pyspark dataframe and write to s3 got error

Asked Mar 16 '21 at 23:54

Active Mar 17 '21 at 08:04

Viewed 1,084 times

Spark version>2. While trying to change a large pandas dataframe to spark dataframe and write to s3, got error:

Serialized task 880:0 was 665971191 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

Tried to do a repartition to increase partition and it did not solve the problem.

Read through this Pyspark: Serialized task exceeds max allowed. Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values

tried following:

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession


spark = (SparkSession.builder
        .master("yarn")
        .appName("myWork") 
        .config("spark.rpc.message.maxSize", "1024mb")
        .getOrCreate())

Still got the problem. Any suggestion?

edited Mar 17 '21 at 00:26

asked Mar 16 '21 at 23:54

newleaf

2,257
8
32
52

pyspark change big pandas dataframe to pyspark dataframe and write to s3 got error

0 Answers0