0

I had a csv file stored in azure datalake storage which i imported in databricks by mounting the datalake account in my databricks cluster, After doing preProcessing i wanted to store the csv back in the same datalakegen2 (blobstorage) account.Any leads and help on the issue is appreciated.Thanks.

inr
  • 1
  • 1

2 Answers2

0

Just write a file in the same mounted location. See example from here: https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html#example-notebook

df.write.json("abfss://<file_system>@<storage-account-name>.dfs.core.windows.net/iot_devices.json")
silent
  • 14,494
  • 4
  • 46
  • 86
  • Thanks a lot, got it, but it is saving the csv file as a random name , any help to save it as my own defined filename.Thanks for the help ! – inr Sep 27 '19 at 11:03
  • we can't write the file with specific name while writing into hdfs, using coalesce(1) generate a single file, rename the file once it is generated using command hadoop fs -mv *.something desired_name – Pabbati Sep 27 '19 at 13:48
0

Just save it directly to Blob storage.

df.write.
    format("com.databricks.spark.csv").
    option("header", "true").
    save("myfile.csv")

There is no point in saving the file locally and then pushing it into the Blob.

ASH
  • 20,759
  • 19
  • 87
  • 200