3

So, I've been able to use multiprocessing to upload multiple files at once to a given server with the following two functions:

import ftplib,multiprocessing,subprocess

def upload(t):
    server=locker.server,user=locker.user,password=locker.password,service=locker.service #These all just return strings representing the various fields I will need.
    ftp=ftplib.FTP(server)
    ftp.login(user=user,passwd=password,acct="")
    ftp.storbinary("STOR "+t.split('/')[-1], open(t,"rb"))
    ftp.close() # Doesn't seem to be necessary, same thing happens whether I close this or not

def ftp_upload(t=files,server=locker.server,user=locker.user,password=locker.password,service=locker.service):
    parsed_targets=parse_it(t)
    ftp=ftplib.FTP(server)
    ftp.login(user=user,passwd=password,acct="")
    remote_files=ftp.nlst(".")
    ftp.close()
    files_already_on_server=[f for f in t if f.split("/")[-1] in remote_files]
    files_to_upload=[f for f in t if not f in files_already_on_server]
    connections_to_make=3 #The maximum connections allowed the the server is 5, and this error will pop up even if I use 1
    pool=multiprocessing.Pool(processes=connections_to_make)
    pool.map(upload,files_to_upload)

My problem is that I (very regularly) end up getting errors such as:

File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
ftplib.error_temp: 421 Too many connections (5) from this IP

Note: There's also a timeout error that occasionally occurs, but I'm waiting for it to rear it's ugly head again, at which point I'll post it.

I don't get this error when I use the command line (i.e. "ftp -inv", "open SERVER", "user USERNAME PASSWORD", "mput *.rar"), even when I have (for example) 3 instances of this running at once.

I've read through the ftplib and multiprocessing documentation, and I can't figure out what it is that is causing these errors. This is somewhat of a problem because I'm regularly backing up a large amount of data and a large number of files.

  1. Is there some way I can avoid these errors or is there a different way of having the/a script do this?
  2. Is there a way I can tell the script that if it has this error, it should wait for a second, and then resume it's work?
  3. Is there a way I can have the script upload the files in the same order they are in the list (of course speed differences would mean they wouldn't all always be 4 consecutive files, but at the moment the order seems basically random)?
  4. Can someone explain why/how more connections are being simultaneously made to this server than the script is calling for?

So, just handling the exceptions seems to be working (except for the occasional recursion error...still have no fucking idea what the hell is going on there).

As per #3, I wasn't looking for that to be 100% in order, only that the script would pick the next file in the list to upload (so differences in processes speeds could/would still cause the order not to be completely sequential, there would be less variability than in the current system, which seems to be almost unordered).

user
  • 555
  • 3
  • 6
  • 21
  • Maybe the library is automatically creating concurrent connections for better performance. [l published a detailed answer about this problem in general, which might be helpful](https://stackoverflow.com/questions/30797269/warning-ftp-login-i-cant-accept-more-than-6-connections-as-the-same-user-in/45131091#45131091) – Armin Šupuk Jul 16 '17 at 17:19

2 Answers2

4

You could try to use a single ftp instance per process:

def init(*credentials):
    global ftp
    server, user, password, acct = credentials
    ftp = ftplib.FTP(server)
    ftp.login(user=user, passwd=password, acct=acct)

def upload(path):
    with open(path, 'rb') as file:
        try:
            ftp.storbinary("STOR " + os.path.basename(path), file)
        except ftplib.error_temp as error: # handle temporary error
            return path, error
        else:
            return path, None

def main():
    # ...
    pool = multiprocessing.Pool(processes=connections_to_make,
                                initializer=init, initargs=credentials)
    for path, error in pool.imap_unordered(upload, files_to_upload):
        if error is not None:
           print("failed to upload %s" % (path,))
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

specifically answering (2) Is there a way I can tell the script that if it has this error, it should wait for a second, and then resume it's work?

Yes.

ftplib.error_temp: 421 Too many connections (5) from this IP

This is an exception. You can catch it and handle it. While python doesn't support tail calls, so this is terrible form, it can be as simple as this:

def upload(t):
    server=locker.server,user=locker.user,password=locker.password,service=locker.service #These all just return strings representing the various fields I will need.
    try:
        ftp=ftplib.FTP(server)
        ftp.login(user=user,passwd=password,acct="")
        ftp.storbinary("STOR "+t.split('/')[-1], open(t,"rb"))
        ftp.close() # Doesn't seem to be necessary, same thing happens whether I close this or not
    except ftplib.error_temp:
        ftp.close()
        sleep(2)
        upload(t)

As for your question (3) if that is what you want, do the upload serially, not in parallel.

I look forward to you posting with an update with an answer to (4). The only thing which comes to my mind is some other process with ftp connection to this IP.

jrwren
  • 17,465
  • 8
  • 35
  • 56