We are using pyspark 1.6. and are trying to convert Text to other file format (like Json,csv etc) with compression (gzip,lz4,snappy etc). But unable to see compressing working.
Please find the code blow we tried. please help us in pointing the issue in our code else suggest an work around. Just to add to the question, none of the compressions are working in 1.6, but its working fine in spark 2.X
Option 1:
from pyspark import SparkContext SparkConf
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
df = sqlContext.read.format('parquet').load('hdfs:///user/U1/json_parque_snappy')
df.write.format('json').save('hdfs:///user/U1/parquet_json_snappy')
Option 2:
df = sqlContext.read.format('parquet').load('hdfs:///user/U1/json_parque_snappy')
df.write.format('json').option('codec','com.apache.hadoop.io.compress.SnappyCodec').save('hdfs:///user/U1/parquet_json_snappy_4')
Option 3:
df = sqlContext.read.format('parquet').load('hdfs:///user/U1/json_parque_snappy')
df.write.format('json').option('compression','snappy').save('hdfs:///user/U1/parquet_json_snappy')