0

I would like to write a spark dataframe to stringIO as a single partitioned csv. This singled-partioned csv is then supposed to be sent to another server using ftp.

the following line does not seem to work:

df.repartition(1).write.csv(file_buffer,mode="overwrite", header=True)

The output is the following error:

py4j.protocol.Py4JError: An error occurred while calling o148.csv. Trace:
py4j.Py4JException: Method csv([class java.util.ArrayList]) does not exist

from ftplib import FTP
import StringIO

file_buffer = StringIO.StringIO()
df.repartition(1).write.csv(file_buffer,mode="overwrite", header=True)

ftp = FTP()
ftp.connect(host, 21)
ftp.login(user=user, passwd=pw)
ftp.storbinary('test.csv', file_buffer)
ftp.quit()

I've also tried df.coalesce(1).write.csv(file_buffer,mode="overwrite", header=True). However, that returns the same error. Btw, I can principally write to S3 with the above mentioned method. Many thanks in advance!

vikrant rana
  • 4,509
  • 6
  • 32
  • 72
C.Tomas
  • 451
  • 2
  • 7
  • 15
  • Have you tried using `coalesce` instead of `repartition`? `df.coalesce(1).write.csv(file_buffer,mode="overwrite", header=True)` – Aaron Arima Sep 04 '19 at 19:11
  • 1
    How can you *"write to S3 with the above mentioned method"*, if the object does not have the method that you are calling? – Martin Prikryl Sep 05 '19 at 05:25

0 Answers0