For writing a parquet file and compressing it with LZO codec, I wrote the following code -
df.coalesce(1).write.option("compression","lzo").option("header","true").parquet("PARQUET.parquet")
But, I am getting this error -
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.lzo.LzoCodec
According to the spark documentation, brotli requires BrotliCodec to be installed. But there are no steps given to install it. The same error is given while compressing with Brotli codec.
How can I install/add the required codecs for running it on PySpark ?
EDIT - LZO compression works with ORC but not with Parquet