6

now before you think, "this has been done before" please read on.

Like most of the people trying to do a find bash script you end up hard-coding the script to a single line command, but end up editing the thing over the following months/years so often that you wish in the end you did it right the first time.

I am writing a little backup program right now to do backups of directories and need to find them, against a list of directorie's that needs to be excluded. Easier said than done. Let me set the stage:

#!/bin/bash
BasePath="/home/adesso/baldar"
declare -a Iggy
Iggy=( "/cgi-bin" 
    "/tmp" 
    "/test" 
    "/html" 
    "/icons" )
IggySubdomains=$(printf ",%s" "${Iggy[@]}")
IggySubdomains=${IggySubdomains:1}
echo $IggySubdomains
exit 0

Now at the end of this you get /cgi-bin,/tmp,/test,/html,/icons This proves that the concept works, but now to take it a bit further I need to use find to search the BasePath and search only one level deep for all subdirectories and exclude the list of subdirectories in the array...

If I type this by hand it would be:

find /var/www/* \( -path '*/cgi-bin' -o -path '*/tmp' -o -path '*/test' -o -path '*/html' -o -path '*/icons' \) -prune -type d

And should I maybe want to loop into each subdirectory and do the same... I hope you get my point.

So What I am trying to do seem possible, but I have a bit of a problem, printf ",%s" doesn't like me using all those find -path or -o options. Does this mean I have to use eval again?

I am trying to use the power of bash here, and not some for loop. Any constructive input would be appreciated.

Adesso
  • 928
  • 2
  • 13
  • 27
  • You don't need to `declare -a`, the assignment is enough. Variable names in bash are generally lower case (or all caps for environment variables) and not camel case. There's no need to `exit 0` at the end (`echo` will have returned 0 anayway). – sorpigal Nov 15 '11 at 17:08
  • 1
    CamelCase just makes stuff easy to read and is a habit of mine I kind like, I am guessing that it would not break anything... but thanks for the other tips, I am really starting to like the community here. :D – Adesso Nov 16 '11 at 08:54
  • Yup, CamelCase is fine; the only place where bad variable names lead to bugs is using ALL_CAPS for things that aren't environment variables or builtins (as that leads to conflicts with variables that _are_ in one of those classes). – Charles Duffy Feb 27 '15 at 22:26

3 Answers3

5

Try something like

find /var/www/* \( -path "${Iggy[0]}" $(printf -- '-o -path "*%s" ' "${Iggy[@]:1}") \) -prune -type d

and see what happens.

EDIT: added the leading * to each path as in your example.

And here's a complete solution based on your description.

#!/usr/bin/env bash
basepath="/home/adesso/baldar"
ignore=("/cgi-bin" "/tmp" "/test" "/html" "/icons")

find "${basepath}" -maxdepth 1 -not \( -path "*${ignore[0]}" $(printf -- '-o -path "*%s" ' "${ignore[@]:1}") \) -not -path "${basepath}" -type d

Subdirectories of $basepath excluding those listed in $ignore, presuming at least two in $ignore (fixing that is not hard).

sorpigal
  • 25,504
  • 8
  • 57
  • 75
  • So I ended up wrapping your code in a command substitution, and assigning it to a variable, but would like it to be a array so I can loop the values and do my backup per directory. `SubDomains=$(find ${BasePath}/* -maxdepth 0 -not \( -path "*${Iggy[0]}" $(printf -- '-o -path "*%s" ' "${Iggy[@]:1}") \) -type d)` I can't seem to find the meaning of the -- you use in printf ?? Sorry for the CamelCase – Adesso Nov 16 '11 at 09:48
  • Try `printf '-o'` by itself and you'll see what it does. For turning the results of this find back into an array I recommend a little command substitution and a while read, e.g. `while IFS= read -r file ; do ... done < <(find ...)` – sorpigal Nov 16 '11 at 12:00
  • This looks very much like it would run into trouble if the `ignore` list contains strings with literal quotes, whitespace, etc. Much safer to build up the strings into a single-quoted array, rather than using command substitution to generate a string that's then string-split and glob expanded. – Charles Duffy Feb 27 '15 at 22:17
  • @Sorpigal, I am trying to do the following but it's not working - it only seems to ignore the first element in the list I created. Is there something wrong in the way I populated the list? `#!/bin/bash ignore=() while IFS= read -r -d $'\n'; do ignore+=("/$REPLY") done < <(find ! -path . -type d -printf '%T@ %P\n' | sort -nr | head -n$1 | awk '{print $2}' && readlink current)` – readytotaste Aug 19 '18 at 12:24
  • @walksignison: Addressing that is too complicated to do here. You might create a new question, or come to ##linux on irc.freenode.net and ask again. – sorpigal Aug 26 '18 at 22:27
2

The existing answers are buggy when given directory names that contain literal whitespace. The safe and robust practice is to use a loop. If your concern is leveraging "the power of bash" -- I'd argue that a robust solution is more powerful than a buggy one. :)

BasePath="/home/adesso/baldar"
declare -a Iggy=( "/cgi-bin" "/tmp" "/test" "/html" "/icons" )

find_cmd=( find "$BasePath" '(' )

## This is the conventional approach:
# for x in "${Iggy[@]}"; do
#  find_cmd+=( -path "*${x}" -o )
#done

## This is the unconventional, only-barely-safe approach
## ...used only to avoid looping:
printf -v find_cmd_str ' -path "*"%q -o ' "${Iggy[@]}"
find_cmd_str=${find_cmd_str%" -o "}
eval "find_cmd+=( $find_cmd_str )"

find_cmd=( "${find_cmd[@]:0:${#find_cmd[@]} - 1}"

# and add the suffix
find_cmd+=( ')' -prune -type d )

# ...finally, to run the command:
"${find_cmd[@]}"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • The point was **not to use a loop**. I think there are enough examples of how to loop and take care of spaces and quotes, this was not the point. **Simplicity** was. The Bugs are features. I was trying to grasp the **basic** power of bash. -- But now you got me started again, so I am trying to see if I can solve this space and quotes problem without loops ;) – Adesso Feb 27 '15 at 22:49
  • In a lot of similar cases, clever use of parameter expansion operators will do the trick (prepending or suffixing each array element, for instance). Not sure if any of those tricks apply here offhand, though. And yes, I understand what you were asking for -- but have a somewhat strong visceral reaction to any question where the "correct" answer is code that nobody should ever use in a production system. If you found a **safe** non-looping answer, it would certainly have my vote. – Charles Duffy Feb 27 '15 at 22:51
  • @WillemP.Botha, ...I actually amended this to avoid looping; though it's an awful horrid hack, `printf '%q '` _does_ make this eval-safe. – Charles Duffy Feb 27 '15 at 22:56
  • I have a eval allergy... but it looks legit – Adesso Feb 27 '15 at 23:01
  • I share that allergy. See also "awful horrid hack". – Charles Duffy Feb 27 '15 at 23:04
0
FIND="$(which find --skip-alias)"
BasePath="/home/adesso/baldar"
Iggy=( "/cgi-bin" 
    "/tmp" 
    "/test" 
    "/html" 
    "/icons" )
SubDomains=( $(${FIND} ${BasePath}/* -maxdepth 0 -not \( -path "*${Iggy[0]}" $(printf -- '-o -path "*%s" ' "${Iggy[@]:1}") \) -type d) )
echo ${SubDomains[1]}

Thanks to @Sorpigal I have a solution. I ended up nesting the command substitution so I can use the script in a cron, and finally added the Array definition around all of it. A known problem would be a directory containing a space in the name. This however has been solved, so trying to keep it simple, I think this answers my question.

Adesso
  • 928
  • 2
  • 13
  • 27