3

My target file on the FTP server is a ZIP file, and the .CSV is located two folders further in.

How would I be able to use BytesIO to allow pandas to read the csv without downloading it?

This is what I have so far:

ftp = FTP('FTP_SERVER')
ftp.login('USERNAME', 'PASSWORD')
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)

With flo as my BytesIO object of interest, how would I be able to navigate a few folders down within the object, to allow pandas to read my .csv file? Is this even necessary?

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992

1 Answers1

4

The zipfile module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv also accepts a file-like object, all should work fine (provided you have enough available memory):

...
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
with ZipFile(flo) as archive:
    with archive.open('foo/fee/bar.csv') as fd:
        df = pd.read_csv(fd)  # add relevant options here include encoding it is matters  
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252