19

What's the best way of comparing two directory structures and deleting extraneous files and directories in the target location?

I have a small web photo gallery app that I'm developing. Users add & remove images using FTP. The web gallery software I've written creates new thumbnails on the fly, but it doesn't deal with deletions. What I would like to do, is schedule a command/bash script to take care of this at predefined intervals.

Original images are stored in /home/gallery/images/ and are organised in albums, using subdirectories. The thumbnails are cached in /home/gallery/thumbs/, using the same directory structure and filenames as the images directory.

I've tried using the following to achieve this:

rsync  -r --delete --ignore-existing /home/gallery/images /home/gallery/thumbs

which would work fine if all the thumbnails have already been cached, but there is no guarantee that this would be the case, when this happens, the thumb directory has original full size images copied to it.

How can I best achieve what I'm trying to do?

Bryan
  • 7,628
  • 15
  • 69
  • 94

3 Answers3

48

You need --existing too:

rsync -r --delete --existing --ignore-existing /home/gallery/images /home/gallery/thumbs

From the manpage:

  --existing, --ignore-non-existing
          This tells rsync to skip creating files (including  directories)
          that  do  not  exist  yet on the destination.  If this option is
          combined with the --ignore-existing option,  no  files  will  be
          updated  (which  can  be  useful if all you want to do is delete
          extraneous files).
Joril
  • 1,610
  • 1
  • 21
  • 28
  • 1
    If there are any errors at all, your awesome answer won't work. That's why you also need to add the `--ignore-errors` argument as well. That's the only thing that worked for me. Thank you for `--existing` and `--ignore-existing`! I used your answer as the basis of my answer [here](https://askubuntu.com/a/1161317/256054). – LonnieBest Jul 26 '19 at 18:58
  • 2
    @LonnieBest If there are errors, not even `rm`, `cp` or `mv` will work as that's what errors are: They are problems that should be looked at and that prevent operations from completing successful. You can instruct most tools to ignore errors (e.g. `-f` for `rm`) but I don't see how that's relevant to the question or this answer. – Mecki Jan 02 '20 at 00:51
  • @Joril it did nothing on my end. – user2284570 Apr 18 '21 at 17:48
8

I don't think rsync is the best approach for this. I would use a bash one-liner like the following:

$ cd /home/gallery/thumbs && find . -type f | while read file;do if [ ! -f "../images/$file" ];then echo "$file";fi;done

If this one-liner produces the right list of files, you can then modify it to run an rm command instead of an echo command.

Tom Shaw
  • 3,752
  • 16
  • 23
  • Thanks Tom. I guess to also clean the directories, I'd need to run it a second time, but specifying directories in the commands instead of files, and substituting the echo with `rmdir`? – Bryan May 31 '11 at 13:42
  • @Bryan: Yes that sounds reasonable. You'd need to change the flags in the `find` and in the `[` test `]`. Of course, please be very careful both with the command I've given you and any modifications, and test thoroughly with `echo`! – Tom Shaw May 31 '11 at 13:49
  • Many thanks, I'll apply copious amounts of echo whilst testing. – Bryan May 31 '11 at 14:07
  • 1
    Just had a thought: you could test with "ls" as well to ensure it works well with whitespace. Best wishes. – Tom Shaw May 31 '11 at 14:11
0

I have to transfer a large amount of data and many files. I have used msrsync to parallelise the rsync streams which works well but you cannot use rsync option '--delete' with msrsync as the multiple streams will conflict and try to delete each others files. So I started looking for a solution to delete files and found this question.

My final solution using the original question as an example and leveraging previous answers (Tom Shaw) is to use:

$ cd /home/gallery 
$ find thumbs -type f  | sed -e 's/^thumbs//' | xargs -P64 -I% sh -c 'if [ ! -f "images/$1" ]; then echo "rm thumbnails/$1"; fi' -- %

The intent here is to only remove files from thumbnails/ that do not exist in images/. This solution may leave empty directories in thumbnails that do not exist in images.

Using xargs allows this to be parallelised '-P64'.

As per Tom Shaw's solution I have used echo in the solution so you can check the outcome is as expected before making it actually delete files.

I post this alternate solution for those people who may have millions of files to deal with and have the resources to run many threads.

JohnM
  • 1