We have a limitation on a platform we are using which has Spark 2.4.7 and underneath Hadoop 2.7.7 libraries present. We have some data present on s3 which is in zstandard parquet format. Is there a way we can write a custome code of some kind to read this zstandard parquet in our job?
We dont have access to the infrastructure, so we can not install anything additionally on the machines. We can increase or decrease the executors (vertically and horizontally).
We have full control on job code and that's what we are required to manage and submit on the platform which submits the code to Spark and executes it.
When we try to read the file using spark.read.parquet("file path") we are getting this error: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.ZStandardCodec
This is obviously expected. When we include hadoop-common 2.9.1 dependency, which had zstandard codec support, we get another error stating: this version of libhadoop was built without zstd support
Is there a way to write a custome class to read the zstandard compressed parquet in to Spark data frame ?
FYI: I already checked some other SOF questions already and did not cover my use case. Specially due to restriction towards infrastructure access.