1

I want to upload a dataframe to a server as csv file with Gzip encoding without saving it on the disc.

It is easy to build some csv file with Gzip encoding using spark-csv lib:

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save(s"result.csv.gz")

But I have no idea how to get Array[Byte], representing my DataFrame, which I can upload via HTTP

Makrushin Evgenii
  • 953
  • 2
  • 9
  • 20

1 Answers1

3

You could write to your remote server as a remote hdfs server, you'd need to have hdfs installed on your remote server but after that you should be able to do something like

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("hdfs://your_remote_server_hostname_or_ip/result.csv.gz")
randal25
  • 1,290
  • 13
  • 10