2

I contact an SFTP server and show files based on the modified timestamp.

Currently, it is done using something like:

  1. files = os.listdir(SFTP)
  2. Loop over files and get the timestamp using os.stat.
  3. Sort the final list in Python.

This looping in Step 2 is very costly when the SFTP is on a different server because it has to make a network call from the server to the SFTP for each and every file.

Is there a way to get both the file and modified time using os.listdir or a similar API?

I am using a Windows back-end and the SFTP connection usually is done using the win32wnet.WNetAddConnection2 package. A generic solution would be helpful, if not a specific solution should be fine too.

Nishant
  • 20,354
  • 18
  • 69
  • 101
  • Can you explain why it would require a roundtrip? How are you sending commands to the server and how do you have the server run them? – Qwerty Feb 12 '18 at 14:16
  • 1
    there's a `scandir.walk()` that returns entries instead of names. On windows it saves a lot of time. are you using Windows? – Jean-François Fabre Feb 12 '18 at 14:16

3 Answers3

2

You should use special libraries for this, such as sftp or ftplib, they provide specific utils that will be helpful for you. Also, you can try to call the interesting command on the server.

voltento
  • 833
  • 10
  • 26
2

If youre able to send one line commands to the server, you could do [os.stat(i) for i in os.listdir()]

If that doesn't work for you, I suppose you could just do os.system("ls -l")

If neither of those work, please do tell me!

Qwerty
  • 1,252
  • 1
  • 11
  • 23
  • But the `[os.stat(i) for i in os.listdir()]` is pretty much what we are doing right? It is just one-liner for the same stuff. The later one is a subprocess call which I am trying to avoid. – Nishant Feb 12 '18 at 14:26
  • @Nishant for the first one, if you're sending the first one for the server to execute, I don't understand why you think it would be a roundabout trip. Please explain? – Qwerty Feb 12 '18 at 14:30
  • The server and the SFTP is two different machines usually. This is the reason there is a round trip. Does that make sense? I specifically use `win32wnet.WNetAddConnection2` in the server. – Nishant Feb 12 '18 at 14:32
  • @Nishant but if you execute the entire line on the server, there is no need to send information back to the client until you have the data you need with the timestamps. – Qwerty Feb 12 '18 at 14:33
  • Yeah but here is not the client-server round trip (that happens just once), but the Server to SFTP round trip that is happening. – Nishant Feb 12 '18 at 14:36
2

If you're using Windows, you've got a lot to gain to use os.scandir() (python 3.5+) or the backport scandir module: scandir.scandir()

That's because on Windows (as opposed to Linux/Unix), os.listdir() already performs a file stat behind the scenes but the result is discarded except for the name. Which forces you to perform another stat call.

scandir returns a list of directory entries, not names. On windows, the size/object type fields are already filled, so when you perform a stat on the entry (as shown in the example below), it's at zero cost:

(taken from https://www.python.org/dev/peps/pep-0471/)

def get_tree_size(path):
    """Return total size of files in given path and subdirs."""
    total = 0
    for entry in os.scandir(path):
        if entry.is_dir(follow_symlinks=False):
            total += get_tree_size(entry.path)
        else:
            total += entry.stat(follow_symlinks=False).st_size
    return total

so just replace your first os.listdir() call by os.scandir() and you'll have all the information for the same cost as a simple os.listdir()

(this is the most interesting on Windows, and a lot less on Linux. I've used it on a slow filesystem on Windows and got a 8x performance gain compared to good old os.listdir followed by os.path.isdir in my case)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219