Writing the dataframe as a tsv in databricks file system (DBFS) with huge data size (30GB to 1TB). I am currently using the below code
df.coalesce(1).write.format("csv").option("delimiter", "\t").option("nullValue",None).option("header", inheader).mode("overwrite").save(tsvPathtemp)
For 100GB it taking an hour to copy the file. I had tried removing the coalesce(1) it copied multiple files but I want one tsv file as a output.
Can any one suggest the best approaches/code to copy the files.
Also how can I import hadoop file system in databricks notebook. refer below question
import org.apache.hadoop.fs.FileUtil
import org.apache.hadoop.fs.FileSystem;