3

I have many files in a directory A.

Some of those files exist in a directory tree with sub-directories B/B1, B/B2, B/B3, B/B4, ... Note that some files have spaces in their names.

For example:

in directory A:

  • there's a file named A/red file.png

  • there's another named A/blue file.png

    and, in directory tree B:

  • there's a file named B/small/red file.png

    In this example, I would like a script to tell me that the file blue file.png does not exist in the directory B.

How can I write a script that will list all the files in A that are not found under the directory tree B?

Community
  • 1
  • 1
martin jakubik
  • 4,168
  • 5
  • 29
  • 42

2 Answers2

7
# A
# ├── blue file.png
# └── red file.png
# B
# └── small
#     └── red file.png

$ comm -23 <( find A -type f -printf '%f\n' | sort | uniq ) <( find B -type f -printf '%f\n' | sort | uniq )
blue file.png

If your find lacks -printf, you can try:

comm -23 <( find A -type f -exec basename {} \; | sort | uniq ) <( find B -type f -exec basename {} \; | sort | uniq )
Thedward
  • 1,432
  • 9
  • 8
  • Thanks. My "find" doesn't seem to have -printf, so I used -print instead. That shouldn't make any difference, right? – martin jakubik Jul 11 '12 at 21:02
  • Hang on. Something's not working. I'm still getting files in the result that exist in both A and B. – martin jakubik Jul 11 '12 at 21:16
  • Okay. Maybe it's the -printf. When I use -print instead, I get the whole filename, including the directory. If -printf '%f' gives me the file's basename, I'll accept the answer, but how would I adapt this script to a "find" that doesn't have -printf? – martin jakubik Jul 11 '12 at 21:23
  • Try: `comm -23 <( find A -type f -exec basename {} \; | sort | uniq ) <( find B -type f -exec basename {} \; | sort | uniq )` – Thedward Jul 11 '12 at 21:30
  • What OS / Version are you using? – Thedward Jul 11 '12 at 21:33
  • The -exec basename {} worked. What's that "\;"? I'm on a Mac with Snow Leopard, but not sure how to tell what version my shell is, nor its tools. – martin jakubik Jul 11 '12 at 21:41
  • The `-exec` option runs a command on each matching file , the `{}` is replaced by the file path and the command is terminated with a semicolon. The backslash before the semicolon is to prevent your shell from interpetetting it as shell syntax. – Thedward Jul 12 '12 at 12:51
0

This is version which can cope with all filenames, including ones containing newlines:

comm -z23 <(find dir1 -type f -printf '%f\0' | sort -uz) <(find dir2 -type f -printf '%f\0' | sort -uz) | xargs -0 printf '%s\n'
Jon
  • 3,573
  • 2
  • 17
  • 24