0

While trying to read a file using pyspark i'm geting this error:

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 60459493. To avoid this, increase spark.kryoserializer.buffer.max value.

Here is the code below, can anyone tell me how do i resolve this error?

`

spark = spark(options={'spark.yarn.queue':'shared_adhoc_mid',
#                        'spark.sql.shuffle.partitions': '1000',
#                        'spark.default.parallelism': '1000', 
#                        'spark.dynamicAllocation.maxExecutors': '1000', 
#                       'spark.executor.memory': '4g', 
#                        'spark.executor.memoryOverhead': '2g', 
#                      'spark.executor.cores':'4' 
#                        'spark.driver.maxResultSize' : '10g',
'spark.sql.autoBroadcastJoinThreshold': -1})

df=spark.read.parquet('path')`

While trying to read a file using pyspark i'm geting this error:

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 60459493. To avoid this, increase spark.kryoserializer.buffer.max value.

Here is the code below, can anyone tell me how do i resolve this error?

`

spark = spark(options={'spark.yarn.queue':'shared_adhoc_mid',
#                        'spark.sql.shuffle.partitions': '1000',
#                        'spark.default.parallelism': '1000', 
#                        'spark.dynamicAllocation.maxExecutors': '1000', 
#                       'spark.executor.memory': '4g', 
#                        'spark.executor.memoryOverhead': '2g', 
#                      'spark.executor.cores':'4' 
#                        'spark.driver.maxResultSize' : '10g',
'spark.sql.autoBroadcastJoinThreshold': -1})
`
df=spark.read.parquet('path')
JG1
  • 1
  • 2

0 Answers0