While trying to read a file using pyspark i'm geting this error:
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 60459493. To avoid this, increase spark.kryoserializer.buffer.max value.
Here is the code below, can anyone tell me how do i resolve this error?
`
spark = spark(options={'spark.yarn.queue':'shared_adhoc_mid',
# 'spark.sql.shuffle.partitions': '1000',
# 'spark.default.parallelism': '1000',
# 'spark.dynamicAllocation.maxExecutors': '1000',
# 'spark.executor.memory': '4g',
# 'spark.executor.memoryOverhead': '2g',
# 'spark.executor.cores':'4'
# 'spark.driver.maxResultSize' : '10g',
'spark.sql.autoBroadcastJoinThreshold': -1})
df=spark.read.parquet('path')`
While trying to read a file using pyspark i'm geting this error:
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 60459493. To avoid this, increase spark.kryoserializer.buffer.max value.
Here is the code below, can anyone tell me how do i resolve this error?
`
spark = spark(options={'spark.yarn.queue':'shared_adhoc_mid',
# 'spark.sql.shuffle.partitions': '1000',
# 'spark.default.parallelism': '1000',
# 'spark.dynamicAllocation.maxExecutors': '1000',
# 'spark.executor.memory': '4g',
# 'spark.executor.memoryOverhead': '2g',
# 'spark.executor.cores':'4'
# 'spark.driver.maxResultSize' : '10g',
'spark.sql.autoBroadcastJoinThreshold': -1})
`
df=spark.read.parquet('path')