0

I am trying to convert csv files into parquet using pyspark.

parquet_file = s3://bucket-name/prefix/

parquet_df.write.format("parquet").option("compression", "gzip").save(parquet_file).mode(SaveMode.Overwrite)

I am trying to overwrite parquet file(s) but getting the following error. Could you please help.

Error occurred - 'NoneType' object has no attribute 'mode'

Traceback (most recent call last): File "/tmp/ma-test-csv-to-parquet-glue-job-2", line 173, in result = write_to_parquet(nn_df1) File "/tmp/ma-test-csv-to-parquet-glue-job-2", line 147, in write_to_parquet parquet_df.write.format("parquet").option("compression", "gzip").save(parquet_file).mode(SaveMode.Overwrite) AttributeError: 'NoneType' object has no attribute 'mode'

mck
  • 40,932
  • 13
  • 35
  • 50

1 Answers1

0

The writing mode should be specified for DataFrameWriter not after save as you did (which returns nothing "None", thus the error message):

parquet_df.write.mode(SaveMode.Overwrite).format("parquet").option("compression","gzip").save(parquet_file)
blackbishop
  • 30,945
  • 11
  • 55
  • 76