7

I have a requirement where I have to pull the latest files from an FTP folder, the problem is that the filename is having spaces and the filename is having a specific pattern. Below is the code I have implemented:

import sys
from ftplib import FTP
import os
import socket
import time
import pandas as pd
import numpy as np
from glob import glob
import datetime as dt
from __future__ import with_statement

ftp = FTP('')
ftp.login('','')
ftp.cwd('')
ftp.retrlines('LIST')

filematch='*Elig.xlsx'
downloaded = []

for filename in ftp.nlst(filematch):
  fhandle=open(filename, 'wb')
  print 'Getting ' + filename
  ftp.retrbinary('RETR '+ filename, fhandle.write)
  fhandle.close()
  downloaded.append(filename)

ftp.quit()

I understand that I can append an empty list to ftp.dir() command, but since the filename is having spaces, I am unable to split it in the right way and pick the latest file of the type that I have mentined above.

Any help would be great.

Manas Jani
  • 699
  • 2
  • 11
  • 33
  • 1
    What is the behavior of the posted program? Does it work correctly for you? Does it print an error message? Does is do something else entirely? – Robᵩ Sep 20 '17 at 15:41
  • It works fine to pull the files that I want and I did so for a one time process. But then going forward, I need to automate it and start picking only the latest files, based on date. – Manas Jani Sep 20 '17 at 15:54
  • For future reference, giving us a example filename would be neat. Just so we know how it actually looks. – Torxed Sep 28 '17 at 07:17
  • ABC File 1 of 3_XXX_MV2_PElig.xlsx, here you go... but I guess the filename should not really be that important! Since the above code already had a file pattern that I had mentioned. – Manas Jani Sep 28 '17 at 12:43
  • If you are communicating only with with one specific FTP server, is should be possible to parse the LIST output for timestamps despite spaces in filenames. Unless MDTM is available (R.Neumann's answer) I see no other way. – VPfB Sep 28 '17 at 12:57
  • The list output has the timestamp but then I want to iterate and bring the latest file out. I thought ftp.retrlines('LIST' -t *Elig.xlsx) would give me a way to put it in the right way but then it isn't helping. – Manas Jani Sep 28 '17 at 13:01

2 Answers2

5

You can get the file mtime by sending the MDTM command iff the FTP server supports it and sort the files on the FTP server accordingly.

def get_newest_files(ftp, limit=None):
    """Retrieves newest files from the FTP connection.

    :ftp: The FTP connection to use.
    :limit: Abort after yielding this amount of files.
    """

    files = []

    # Decorate files with mtime.
    for filename in ftp.nlst():
        response = ftp.sendcmd('MDTM {}'.format(filename))
        _, mtime = response.split()
        files.append((mtime, filename))

    # Sort files by mtime and break after limit is reached.
    for index, decorated_filename in enumerate(sorted(files, reverse=True)):
        if limit is not None and index >= limit:
            break

        _, filename = decorated_filename  # Undecorate
        yield filename


downloaded = []

# Retrieves the newest file from the FTP server.
for filename in get_newest_files(ftp, limit=1):
    print 'Getting ' + filename

    with open(filename, 'wb') as file:
        ftp.retrbinary('RETR '+ filename, file.write)

    downloaded.append(filename)
Community
  • 1
  • 1
Richard Neumann
  • 2,986
  • 2
  • 25
  • 50
  • I tried running this code but still pulls all the files from ftp of the corresponding type, and not the latest of them. – Manas Jani Sep 28 '17 at 13:41
  • 1
    Thank you so much! This worked... I just had to add the argument to reverse the sorted(files), to pick up the latest file and also change the limit to 1 to pick up just the latest file. Once again, thank you for the help! – Manas Jani Sep 28 '17 at 14:51
1

The issue is that the FTP "LIST" command returns text for humans, which format depends on the FTP server implementation.

Using PyFilesystem (in place of the standard ftplib) and its API will provide a "list" API (search "walk") that provide Pythonic structures of the file and directories lists hosted in the FTP server.

http://pyfilesystem2.readthedocs.io/en/latest/index.html

glenfant
  • 1,298
  • 8
  • 9