3

I am trying to mirror a public FTP to a local directory. When I use wget -m {url} then wget quite quickly skips lots of files that have been already downloaded (and no newer version exists), when I use lftp open -u user,pass {url}; mirror then lftp sends MDTM for every file before deciding whether to download the file or not. With 2 million+ files in 50 thousand+ directories this is very slow, besides I get error messages that MDTM of directories could not be obtained.

In the manual it says that using set sync-mode off will result in sending all requests at once, so that lftp doesn't wait for each response. When I do that, I get error messages from the server saying there are too many connections from my IP address.

I tried running wget first to download only the newer files, but this does not delete the files which were removed from the FTP server, so I follow up with lftp to remove the old files, however lftp still sends MDTM on each file, which means that there is no advantage to this approach.

If I use set ftp:use-mdtm off, then it seems that lftp just downloads all files again.

Could someone suggest the correct setting for lftp with large number of directories/files (specifically, so that it skips directories which were not updated, like wget seems to do)?

econ
  • 547
  • 7
  • 22

1 Answers1

3

Use set ftp:use-mdtm off and mirror --ignore-time for the first invocation to avoid re-downloading all the files.

You can also try to upgrade lftp and/or use set ftp:use-mlsd on, in this case lftp will get precise file modification time from the MLSD command output (provided that the server supports the command).

lav
  • 1,351
  • 9
  • 17
  • Thank you for this comment. I'm using OSX and `brew install lftp --upgrade` tells me that I have the latest version 4.6.5. I can see from http://lftp.yar.ru that 4.7 is out, so I'll wait until it becomes available via brew. – econ Apr 18 '16 at 11:20
  • Tried with `set ftp:use-mdtm off` and `mirror --ignore-time` and the program checked the directories very quickly. Could you clarify, when you say 'first invocation', what would be the correct setting for the second invocation? – econ Apr 18 '16 at 11:24
  • On second and further invocations `--ignore-time` option won't be needed as the timestamps would already correspond to the displayed timestamps after the first mirror. – lav Apr 18 '16 at 14:30