2

I've written some code to log on to a an AS/400 FTP site, move to a certain directory, and locate files I need to download. It works, but it seems like when there are MANY files to download I receive:

socket.error: [Errno 10054] An existing connection was 
              forcibly closed by the remote host

I log on and navigate to the appropriate directory successfully:

try:
    newSession = ftplib.FTP(URL,username,password)
    newSession.set_debuglevel(3)
    newSession.cwd("SOME DIRECTORY")
except ftplib.all_errors, e:
    print str(e).split(None,1)
    sys.exit(0) 

I grab a list of the files I need:

filesToDownload= filter(lambda x: "SOME_FILE_PREFIX" in x, newSession.nlst())

And here is where it is dying (specifically the newSession.retrbinary('RETR '+f,tempFileVar.write)):

for f in filesToDownload:
    newLocalFileName = f + ".edi"
newLocalFilePath = os.path.join(directory,newLocalFileName)
tempFileVar = open(newLocalFilePath,'wb')
newSession.retrbinary('RETR '+f,tempFileVar.write)
tempFileVar.close()

It downloads upwards of 85% of the files I need before I'm hit with the Errno 10054 and I guess I'm just confused as to why it seems to arbitrarily die when so close to completion. My honest guess right now is too many requests to the FTP when trying to pull these files.

Here's a screenshot of the error as it appears on my command prompt:

enter image description here

Any advice or pointers would be awesome. I'm still trying to troubleshoot this.

Stephen Tetreault
  • 769
  • 1
  • 8
  • 26
  • @JaneT: The Python `ftplib` module is at a lower level than your typical FTP client software, so there isn't any `mget`. – John Y Apr 05 '13 at 13:24
  • Is third-party software an option? I have found it easier to script FTP using something like WinSCP than directly in Python. Actually, I still use Python to generate the WinSCP scripts, invoke WinSCP on those scripts, and process the files locally. Python and WinSCP are a good combination when your local machine is Windows (which I can see it is from your traceback). – John Y Apr 05 '13 at 13:31
  • @JaneT I don't have access to the joblog on the AS/400 unfortunately. It is on the client end and even if I made a request for it there'd be a slow turn around. – Stephen Tetreault Apr 05 '13 at 13:48
  • @JohnY I'd like to not regard 3rd party software as an option to be honest. We had a (outdated) Perl script on a server we are migrating our data processing tools from and it was working semi-successfully and as I mentioned in my post it DOES work but can't handle larger volumes of downloads which is frustrating. – Stephen Tetreault Apr 05 '13 at 13:52
  • I mention WinSCP because it is free (including source) and quite robust. If you are only having problems with larger jobs, could you perhaps break up your requests into smaller chunks (separate sessions)? Like maybe only grab at most n files in any given session. – John Y Apr 05 '13 at 14:33
  • @JohnY thanks for the reply. I'm trying to come up with some functionality to perhaps emulate an MGET command. I would like to break this up into smaller chunks or separate session though the core functionality I have to achieve is to download whatever number of files there are posted for that day. The annoying thing is the VOLUME of the files I'm downloading is only ~40mb but the way the files are broken up by the client means there are hundreds, maybe thousands of small binary files that comprise the ~40mb total size. – Stephen Tetreault Apr 05 '13 at 15:00
  • @JohnY just an update if you're interested: I've been in touch with the client's IT team they too are very confused as to why their AS400 system is behaving this way. For now I'm just working on catching the socket error and trying to resume where I left off in the download process. Frustrating, but I guess it's all I can do for now. – Stephen Tetreault Apr 05 '13 at 16:11
  • have you tried adding a keep alive call during the process? Assuming the files are coming down in new sessions, it is possible the main session is ending? Sorry don't know python as we just use the builtin windows client to bring files down. – Jane T Apr 06 '13 at 11:42
  • Hi @JaneT thanks for responding and no worries about not being too familiar with Python. The issue isn't one of timing out, if that's what you're referring to. The problem seems to be on their (the client's) end, and one of their IT people is actually surprised at the behavior happening on the FTP. I'm trying to come up with a work around but the main issue, like I said, isn't one of the main session ending but a socket error being thrown resulting in a fatal disconnection. – Stephen Tetreault Apr 06 '13 at 14:09
  • @JaneT My idea right now is to catch the socket error when it is thrown, try and keep track of the last file downloaded and resume downloading a subset of the original FTP file list to download all the complete data. I'm not really happy about the workaround, but hey I guess if the functionality is necessary it should be implemented. – Stephen Tetreault Apr 06 '13 at 14:11

1 Answers1

0

There's no real answer to this I suppose, it seems like the client's FTP is at fault here it's incredibly unstable. Best I can do is a hacky work around catching the thrown socket error and resuming where I left off in the previous session before being forcibly disconnected. Client's IT team is looking into the problem on their end finally.

Sigh.

Stephen Tetreault
  • 769
  • 1
  • 8
  • 26