I'm writing an FTP client using Twisted that downloads a lot of files and I'm trying to do it pretty intelligently. However, I've been having the problem that I'll download several files very quickly (sometimes ~20 per batch, sometimes ~250) and then the downloading will hang, only to eventually have connections time out and then the download and hang start all over again. I'm using a DeferredSemaphore to only download 3 files at a time, but I now suspect that this is probably not the right way to avoid throttling the server.
Here is the code in question:
def downloadFiles(self, result, directory):
# make download directory if it doesn't already exist
if not os.path.exists(directory['filename']):
os.makedirs(directory['filename'])
log.msg("Downloading files in %r..." % directory['filename'])
files = filterFiles(None, self.fileListProtocol)
# from http://stackoverflow.com/questions/2861858/queue-remote-calls-to-a-python-twisted-perspective-broker/2862440#2862440
# use a DeferredSemaphore to limit the number of files downloaded simultaneously from the directory to 3
sem = DeferredSemaphore(3)
jobs = [sem.run(self.downloadFile, f, directory) for f in files]
d = gatherResults(jobs)
return d
def downloadFile(self, f, directory):
filename = os.path.join(directory['filename'], f['filename']).encode('ascii')
log.msg('Downloading %r...' % filename)
d = self.ftpClient.retrieveFile(filename, FTPFile(filename))
return d
You'll noticed that I'm reusing an FTP connection (active, by the way) and using my own FTPFile instance to make sure the local file object gets closed when the file download connection is 'lost' (ie completed). Looking at FTPClient I wonder if I should be using queueCommand directly. To be honest, I got lost following the retrieveFile command to _openDataConnection and beyond, so maybe it's already being used.
Any suggestions? Thanks!