0

I am trying to write a parquet file into CSV using the df.write.csv but the output CSV file has a big name (part -0000- ), how can I rename that?

I searched and I found that it can be done in scala using the following code.

import org.apache.hadoop.fs._
fs = FileSystem.get(spark.hadoopConfiguration)

fs = FileSystem.get(sc.hadoopConfiguration)
fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv"))

How can it be done in pyspark?

Mohana B C
  • 5,021
  • 1
  • 9
  • 28

1 Answers1

1

It cannot be done with Spark directly. The solution in Scala can be adapted to Python :

fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())


def rename(old_file_name, new_file_name):
    fs.rename(
        spark._jvm.org.apache.hadoop.fs.Path(old_file_name),
        spark._jvm.org.apache.hadoop.fs.Path(new_file_name),
    )
Steven
  • 14,048
  • 6
  • 38
  • 73