20 columns and 10000 rows is not so huge, so except if you are working on an embedded system with tiny storage, the simplest way is to just download the file with ftplib
and then load it into a dataframe with read_csv
(see NicolasDupuy's answer).
If you really and for any reason want to avoid to store it on the local disk, it will be a little trickyer because pandas read_csv
is not able to read from a stream and requires a plain file. That means that you will have to parse the file by hand or only with the csv
module, and then feed a dataframe from that.
Code could be:
import csv
import ftplib
import pandas as pd
FTP = ftplib.open('ftp.xyz.com', 'random', 'password')
FTP.cwd('rootfolder/testfolder')
first = True # to identify the header line
data = []
columns = None
def process_row(line):
global columns
if first:
columns = parse(line)
first = False
else:
data.append(parse(line))
def parse(line):
# Assume trivial csv file here - use the csv module for more complex use cases
return line.decode().strip().split(',')
FTP.retrline('RETR testreport.csv', process_row)
df = pd.Dataframe(data = data, columns = columns)
# or for automatic conversion to float:
# df = pd.Dataframe(data = data, columns = columns, dtype = float)
Beware:
- above code is untested and can contain typos...
- above code contains no exception processing and will be unsafe if given incorrect input
- as you cannot use
read_csv
you have no magical guess of the colum types
Said differently: unless you have a strong reason to do so, do not use that and just download the file first....