0

I'm trying to :

  1. read a .csv file (compressed in a zipfile that is stored on FTP) by using ftplib
  2. store the .csv file on a virtual file on memory by using io
  3. transform the virutal file to a dataframe by using pandas

enter image description here

For that I'm using the code below and it works really fine for the first scenario (path1, see image above) :

CODE :

import ftplib
import zipfile
import io
import pandas as pd

ftp = ftplib.FTP("theserver_name")
ftp.login("my_username","my_password")
ftp.encoding = "utf-8"

ftp.cwd('folder1/folder2')
filename = 'zipFile1.zip'

download_file = io.BytesIO()
ftp.retrbinary("RETR " + filename, download_file.write)
download_file.seek(0)
zfile = zipfile.ZipFile(download_file)

df = pd.read_csv(zfile.namelist()[0], delimiter=';')

display(df)

But in the second scenario (path2) and after changing my code, I get the error below :

CODE UPDATE :

ftp.cwd('folder1/folder2/')
filename = 'zipFile2.zip'

ERROR AFTER UPDATE :

FileNotFoundError: [Errno 2] No such file or directory: 'folder3/csvFile2.csv'

It seems like Python don't recognize the folder3 (contained in the zipFile2). Is there any explanation for that, please ? How can we fix that ? I tried with ftp.cwd('folder3') right before pd.read.csv() but it doesn't work..

Timeless
  • 22,580
  • 4
  • 12
  • 30
  • This question confuses me greatly. You're trying to download a zip file into memory, right? But when you issue the `RETR` command, you pass `filename`, which contains the name of a csv file. – snwflk Aug 14 '22 at 11:45
  • Hi @snwflk, please consider the update I made on the code. `filename = 'zipFile1.zip'`. – Timeless Aug 14 '22 at 11:46
  • Which line gives you that error? – gtomer Aug 14 '22 at 11:47
  • Hi @gtomer, it's this one `df = pd.read_csv(zfile.namelist()[0], delimiter=";")` – Timeless Aug 14 '22 at 11:50
  • Not sure what I'm supposed to consider here. You only changed the filename, does the output change? Questions on SO should not be a moving target. – snwflk Aug 14 '22 at 11:50
  • @snwflk, thank you for the support. I don't think my question is a moving target. I'm trying to make it as clear/precise as possible for the SO community. I hope I can figure out a way to tell python how to recognize my `csfFile2.csv`. – Timeless Aug 14 '22 at 11:53
  • It becomes a moving target the second you make substantial changes to the code. Any change to the code means a complete reset for the reader. Changing the filename you give to the FTP library is a substantial edit in this question, seeing that the error you're getting is about files not being found. What makes it even more difficult to help is that you edited the code without modifying the error output that you get. – snwflk Aug 14 '22 at 12:00
  • Sorry about that. Still the same error output by the way! – Timeless Aug 14 '22 at 12:02

1 Answers1

0

Thanks to Serge Ballesta in his post here, I finally figure out how to transform csvFile2.csv to a DataFrame :

import ftplib
import zipfile
import io
import pandas as pd

ftp = ftplib.FTP("theserver_name")
ftp.login("my_username","my_password")
ftp.encoding = "utf-8"
    
flo = io.BytesIO()
ftp.retrbinary('RETR /folder1/folder2/zipFile2.zip', flo.write)
flo.seek(0)

with zipfile.ZipFile(flo) as archive:
    with archive.open('folder3/csvFile2.csv') as fd:
        df = pd.read_csv(fd, delimiter=';')
        
display(df)
Timeless
  • 22,580
  • 4
  • 12
  • 30