2

Is it possible to calculate the last modified time of a directory, taking changing file contents into account?

We are trying to watch a series of upload directories to determine when user FTP sessions are complete.

Users are uploading a set of files into a specific directory, and we'd like to detect when the last file within that directory hasn't changed within N minutes. (We are using this as a proxy for "FTP session is done".)

We started with this to find directories that have been idle for more than 5, but less than 10, minutes:

find . -mmin +5 -mmin -10 -type d -ls

The directory timestamp used here is based on the time the most recent file was added to the directory.

I have read Directory last modified date and it is clear that reading the mtime or mmin for the directory won't work since it doesn't change when files within the directory have their contents updated. Thus, the above won't work because if that last file is a large file that may take > 10 minutes to upload then the directory won't really have been idle (i.e. all files unchanged) when this triggers.

Are there a shell-based alternatives (ideally a configuration of the find command) that use the mtime of the last changed file inside as the timestamp, but still operate at the directory level (i.e. we don't want to get multiple hits based on all the files inside a single directory)?

Community
  • 1
  • 1
Ramon
  • 159
  • 2
  • 10
  • 2
    Consider an alternate solution, as I don't think you're going to find the options that you're looking for (maybe you can code it). If you're getting files from a commercial source as part of a contract, its not unreasonable to ask them to send a special 'flag' file as the very last file. You can then just loop, looking for the flagfile and when it has arrived, you can start your processing (which should include a validation step, to be sure that all the files you are are there, that they aren't the same as yesterdays, not empty (unless that is OK), The flag file could be a filelisting of size – shellter Mar 23 '12 at 03:45
  • Unfortunately these tend to come in from very non-technical users that are just dropping files into a client UI. We are looking to distribute a custom client that would allow this flag method behind the scenes, but we'd still like a fallback for scenarios where a custom client deployment wont' work – Ramon May 06 '12 at 21:35

2 Answers2

2

I agree with @shellter's comment that a flag file is the best way to go. That depends on your user's though to agree to upload that file.

To find the most recent file under the current directory

find . -type f -printf "%T@ %p\n" | sort -nr | head -1

The output is 2 fields:

  1. the time in seconds since the epoch, with fractional microseconds
  2. the relative path to the file
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thanks Glenn; this is what I was looking for, at least to get an interim solution to work – Ramon May 06 '12 at 21:34
1

I'm also on the "alternate solution" side of the fence.

If you have access to your FTP server's log, I suggest tailing that log to watch for successful uploads. This sort of event-triggered approach will be faster, more reliable, and less load than a polling approach like the one described in your question.

The way you handle this will of course depend on your FTP server. I have one running vsftpd whose logs include lines like this:

Fri Mar 23 07:36:02 2012 [pid 94378] [joe] OK LOGIN: Client "10.8.7.16"
Fri Mar 23 07:36:12 2012 [pid 94380] [joe] OK UPLOAD: Client "10.8.7.16", "/path/to/file.zip", 8395136 bytes, 845.75Kbyte/sec
Fri Mar 23 07:36:12 2012 [pid 94380] [joe] OK CHMOD: Client "10.8.7.16", "/path/to/file.zip 644"

The UPLOAD line only gets added when vsftpd has successfully saved the file. You could parse this in a shell script like this:

#!/bin/sh

tail -F /var/log/vsftpd.log | while read line; do
  if echo "$line" | grep -q 'OK UPLOAD:'; then
    filename=$(echo "$line" | cut -d, -f2)
    if [ -s "$filename" ]; then
      # do something with $filename
    fi
  fi
done

It's not the fanciest shell script, and to be honest I'd probably write it a little differently if I were using it myself, but this illustrates the idea well enough.

ghoti
  • 45,319
  • 8
  • 65
  • 104
  • with a `sleep 60` or 300 or 600 in there at the bottom of the while loop, right? ;-) Nice alternate! – shellter May 06 '12 at 22:25
  • Thanks. Actually, you wouldn't want a "sleep" at the bottom of the while loop, unless you have some reason to delay before handling the next uploaded file. Since you're using `tail -F`, the log output will trigger the loop only for new log entries. Once the `# do something` code is executed, this script will simply wait for new log data to come through. A polling solution would need a delay, but this is *[event driven](http://en.wikipedia.org/wiki/Event-driven_programming)* instead of polled. – ghoti May 07 '12 at 15:52