1

Spark version>2. While trying to change a large pandas dataframe to spark dataframe and write to s3, got error:

Serialized task 880:0 was 665971191 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

Tried to do a repartition to increase partition and it did not solve the problem.

Read through this Pyspark: Serialized task exceeds max allowed. Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values

tried following:

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession


spark = (SparkSession.builder
        .master("yarn")
        .appName("myWork") 
        .config("spark.rpc.message.maxSize", "1024mb")
        .getOrCreate())

Still got the problem. Any suggestion?

newleaf
  • 2,257
  • 8
  • 32
  • 52

0 Answers0