4

I'm trying to run find, and exclude several directories listed in an array. I'm finding some weird behavior when it's expanding, though, which is causing me issues:

~/tmp> skipDirs=( "./dirB" "./dirC" )
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/\*" ' "${skipDirs[@]:1}") \) -prune); echo $bars
./dirC/bar.txt ./dirA/bar.txt

This did not skip dirC as I wold have expected. The problem is that the print expands the quotes around "./dirC".

~/tmp> set -x 
+ set -x
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/*" ' "${skipDirs[@]:1}") \) -prune); echo $bars
+++ printf -- '-o -path "%s/*" ' ./dirC
++ find . -name 'bar*' -not '(' -path './dirB/*' -o -path '"./dirC/*"' ')' -prune
+ bars='./dirC/bar.txt
./dirA/bar.txt'
+ echo ./dirC/bar.txt ./dirA/bar.txt
./dirC/bar.txt ./dirA/bar.txt

If I try to remove the quotes in the $(print..), then the * gets expanded immediately, which also gives the wrong results. Finally, if I remove the quotes and try to escape the *, then the \ escape character gets included as part of the filename in the find, and that does not work either. I'm wondering why the above does not work, and, what would work? I'm trying to avoid using eval if possible, but currently I'm not seeing a way around it.

Note: This is very similar to: Finding directories with find in bash using a exclude list, however, the posted solutions to that question seem to have the issues I listed above.

Community
  • 1
  • 1
John
  • 3,400
  • 3
  • 31
  • 47

2 Answers2

5

The safe approach is to build your array explicitly:

#!/bin/bash

skipdirs=( "./dirB" "./dirC" )

skipdirs_args=( -false )
for i in "${skipdirs[@]}"; do
    args+=( -o -type d -path "$i" )
done

find . \! \( \( "${skipdirs_args[@]}" \) -prune \) -name 'bar*'

I slightly modify the logic in your find since you had a slight (logic) error in there: your command was:

find -name 'bar*' -not stuff_to_prune_the_dirs

How does find proceed? it will parse the files tree and when it finds a file (or directory) that matches bar* then it will apply the -not ... part. That's really not what you want! your -prune is never going to be applied!

Look at this instead:

find . \! \( -type d -path './dirA' -prune \)

Here find will completely prune the directory ./dirA and print everything else. Now it's among everything else that you want to apply the filter -name 'bar*'! the order is very important! there's a big difference between this:

find . -name 'bar*' \! \( -type d -path './dirA' -prune \)

and this:

find . \! \( -type d -path './dirA' -prune \) -name 'bar*'

The first one doesn't work as expected at all! The second one is fine.

Notes.

  • I'm using \! instead of -not as \! is POSIX, -not is an extension not specified by POSIX. You'll argue that -path is not POSIX either so it doesn't matter to use -not. That's a detail, use whatever you like.
  • You had to use some dirty trick to build your commands to skip your dir, as you had to consider the first term separately from the other. By initializing the array with -false, I don't have to consider any terms specially.
  • I'm specifying -type d so that I'm sure I'm pruning directories.
  • Since my pruning really applies to the directories, I don't have to include wildcards in my exclude terms. This is funny: your problem that seemingly is about wildcards that you can't handle disappears completely when you use find appropriately as explained above.
  • Of course, the method I gave really applies with wildcards too. For example, if you want to exclude/prune all subdirectories called baz inside subdirectories called foo, the skipdirs array given by

    skipdirs=( "./*/foo/baz" "./*/foo/*/baz" )
    

    will work fine!

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
4

The issue here is that the quotes you are using on "%s/*" aren't doing what you think they are.

That is to say, you think you need the quotes on "%s/*" to prevent the results from the printf from being globbed however that isn't what is happening. Try the same thing without the directory separator and with files that start and end with double quotes and you'll see what I mean.

$ ls
"dirCfoo"
$ skipDirs=( "dirB" "dirC" )
$ printf '%s\n' -- -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[@]:1}")
-path
dirB*
-o
-path
"dirCfoo"
$ rm '"dirCfoo"'
$ printf -- '%s\n' -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[@]:1}")
-path
dirB*
-o
-path
"dirC*"

See what I mean? The quotes aren't being handled specially by the shell. They just happen not to glob in your case.

This issue is part of why things like what is discussed at http://mywiki.wooledge.org/BashFAQ/050 don't work.

To do what you want here I believe you need to create the find arguments array manually.

sD=(-path /dev/null)
for dir in "${skipDirs}"; do
    sD+=(-o -path "$dir")
done

and then expand "${sD[@]}" on the find command line (-not \( "${sD[@]}" \) or so).

And yes, I believe this makes the answer you linked to incorrect (though the other answer might work (for non-whitespace, etc. files) because of the array indirection that is going on.

Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
  • 1
    Kudos for explaining the quoting issue. Let me try a summary: The command substitution is _unquoted_, so its output is subject to shell expansions, including globbing. The double quotes become a _literal_ part of the tokens they enclose - they are _not_ parsed as a double-quoted string by the shell (only `eval` would do that). So, globbing _is_ applied to _literal_ `"*"`, which will _typically_ not match anything (since most filenames aren't enclosed in literal double quotes). What ultimately goes wrong is that the literal double quotes are passed to `find` _as part of the argument_. – mklement0 Feb 24 '15 at 22:14