Saving a dataframe as a csv file(processed in databricks) and uploading it to azure datalake blob storage

Question

I had a csv file stored in azure datalake storage which i imported in databricks by mounting the datalake account in my databricks cluster, After doing preProcessing i wanted to store the csv back in the same datalakegen2 (blobstorage) account.Any leads and help on the issue is appreciated.Thanks.

score 0 · Answer 1 · answered Sep 27 '19 at 10:36

0

Just write a file in the same mounted location. See example from here: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html#example-notebook

df.write.json("abfss://<file_system>@<storage-account-name>.dfs.core.windows.net/iot_devices.json")

answered Sep 27 '19 at 10:36

silent

14,494
4
46
86

Thanks a lot, got it, but it is saving the csv file as a random name , any help to save it as my own defined filename.Thanks for the help ! – inr Sep 27 '19 at 11:03
we can't write the file with specific name while writing into hdfs, using coalesce(1) generate a single file, rename the file once it is generated using command hadoop fs -mv *.something desired_name – Pabbati Sep 27 '19 at 13:48

score 0 · Answer 2 · answered Oct 02 '19 at 02:44

0

Just save it directly to Blob storage.

df.write.
    format("com.databricks.spark.csv").
    option("header", "true").
    save("myfile.csv")

There is no point in saving the file locally and then pushing it into the Blob.

answered Oct 02 '19 at 02:44

ASH

20,759
19
87
200

Saving a dataframe as a csv file(processed in databricks) and uploading it to azure datalake blob storage

2 Answers2