Rename files after writing dataframe as CSV in PySpark

Question

I am trying to write a parquet file into CSV using the df.write.csv but the output CSV file has a big name (part -0000- ), how can I rename that?

I searched and I found that it can be done in scala using the following code.

import org.apache.hadoop.fs._
fs = FileSystem.get(spark.hadoopConfiguration)

fs = FileSystem.get(sc.hadoopConfiguration)
fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv"))

How can it be done in pyspark?

Check [this](https://stackoverflow.com/questions/58171365/how-to-rename-my-json-generated-by-pyspark) — Mohana B C, Aug 24 '21 at 09:01
if your dataframe isnt huge, use `pandas.to_csv()` after converting the df `toPandas`, there is a reason why spark writes the data in parts — anky, Aug 24 '21 at 10:15

Steven · Answer 1 · 2021-08-24T12:02:53.823

1

It cannot be done with Spark directly. The solution in Scala can be adapted to Python :

fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())


def rename(old_file_name, new_file_name):
    fs.rename(
        spark._jvm.org.apache.hadoop.fs.Path(old_file_name),
        spark._jvm.org.apache.hadoop.fs.Path(new_file_name),
    )

edited Aug 24 '21 at 12:02

answered Aug 24 '21 at 09:10

Steven

14,048
6
38
73

the above solution doesn't work with pyspark . please let me know if there is any other way to catch it before printing it out – NEERAJ GHATE Aug 24 '21 at 10:03
@NEERAJGHATE see my edit – Steven Aug 24 '21 at 12:03
What is the point of this? – thebluephantom Aug 24 '21 at 16:16
@Steven thank you 11 – NEERAJ GHATE Aug 25 '21 at 15:36
@thebluephantom i am trying ro extract parquet file efficiently. still learning! – NEERAJ GHATE Aug 25 '21 at 15:37

Rename files after writing dataframe as CSV in PySpark

1 Answers1

Related