It looks like a schema mismatch problem here.
If you set your schema to be not nullable, and create your dataframe with None value, Spark would throw you ValueError: This field is not nullable, but got None error.
[Pyspark]
from pyspark.sql.functions import * #udf, concat, col, lit, ltrim, rtrim
from pyspark.sql.types import *
schema = ArrayType(StructType([StructField('A', IntegerType(), nullable=False)]))
# It will throw "ValueError".
df = spark.createDataFrame([[[None]],[[2]]],schema=schema)
df.show()
But it is not the case if you use udf.
Using the same schema, if you use udf for transformation, it won't throw you ValueError even if your udf return a None. And it is the place where data schema mismatch happens.
For example:
df = spark.createDataFrame([[[1]],[[2]]], schema=schema)
def throw_none():
def _throw_none(x):
if x[0][0] == 1:
return [['I AM ONE']]
else:
return x
return udf(_throw_none, schema)
# since value col only accept intergerType, it will throw null for
# string "I AM ONE" in the first row. But spark did not throw ValueError
# error this time ! This is where data schema type mismatch happen !
df = df.select(throw_none()(col("value")).name('value'))
df.show()

Then, the following parquet write and read will throw you the parquet.io.ParquetDecodingException error.
df.write.parquet("tmp")
spark.read.parquet("tmp").collect()
So be very careful on the null value if you are using udf, return the right data type in your udf. And unless it is unnecessary, please dont set nullable=False in your StructField. Set nullable=True will solve all the problem.