2

I very often use find to search for files and symbols in a huge source tree. If I don't limit the directories and file types, it takes several minutes to search for a symbol in a file. (I already mounted the source tree on an SSD and that halved the search time.)

I have a few aliases to limit the directories that I want to search, e.g.:

alias findhg='find . -name .hg -prune -o' 
alias findhgbld='find . \( -name .hg -o -name bld \) -prune -o' 
alias findhgbldins='find . \( -name .hg -o -name bld -o -name install \) -prune -o'

I then also limit the file types as well, e.g.:

findhgbldins \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \) 

But sometimes I only want to check for symbols in cmake files:

findhgbldins \( -name '*.cmake' -o -name '*.txt' \) -exec egrep -H 'pattern' \;

I could make a whole bunch of aliases for all possible combinations, but it would be a lot easier if I could use variables to select the file types, e.g:

export SEARCHALL="\( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \)"
export SEARCHSRC="\( -name '*.[hc]' -o -name '*.cpp' \)"

and then call:

findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;

I tried several variants of escaping \, (, * and ), but there was no combination that did work. The only way I could make it to work, was to turn off globbing in Bash, i.e. set -f, before calling my 'find'-contraption and then turn globbing on again.

One alternative I came up with is to define a set of functions (with the same names as my aliases findhg, findhgbldins, and findhgbldins), which take a simple parameter that is used in a case structure that selects the different file types I am looking for, something like:

findhg {
    case $1 in
        '1' )
            find <many file arguments> ;;
        '2' )
            find <other file arguments> ;;
        ...
    esac
}

findhgbld {
    case $1 in
        '1' )
            find <many file arguments> ;;
        '2' )
            find <other file arguments> ;;
        ...
    esac
}

etcetera

My question is: Is it at all possible to pass these types of arguments to a command as a variable ?

Or is there maybe a different way to achieve the same i.e. having a combination of a command (findhg, findhgbld,findhgbldins) and a single argument to create a large number of combinations for searching ?

NZD
  • 1,780
  • 2
  • 20
  • 29

1 Answers1

2

It's not really possible to do what you want without unpleasantness. The basic problem is that when you expand a variable without double-quotes around it (e.g. findhgbldins $SEARCHALL), it does word splitting and glob expansion on the variable's value, but does not interpret quotes or escapes, so there's no way to embed something in the variable's value to suppress glob expansion (well, unless you use invalid glob patterns, but that'd keep find from matching them properly too). Putting double-quotes around it (findhgbldins "$SEARCHALL") suppresses glob expansion, but it also suppresses word splitting, which you need to let find interpret the expression properly. You can turn off glob expansion entirely (set -f, as you mentioned), but that turns it off for everything, not just this variable.

One thing that would work (but would be annoying to use) would be to put the search options in arrays rather than plain variables, e.g.:

SEARCHALL=( \( -name '*.cmake' -o -name '*.txt' -o -name '*.[hc]' -o -name '*.py' -o -name '*.cpp' \) )
findhgbldins "${SEARCHALL[@]}" -exec egrep -H 'pattern' \;

but that's a lot of typing to use it (and you do need every quote, bracket, brace, etc to get the array to expand right). Not very helpful.

My preferred option would be to build a function that interprets its first argument as a list of file types to match (e.g. findhgbldins mct -exec egrep -H 'pattern' \; might find make/cmake, c/h, and text files). Something like this:

findhgbldins() {
filetypes=()
if [[ $# -ge 1 && "$1" != "-"* ]]; then # if we were passed a type list (not just a find primitive starting with "-")
    typestr="$1"
    while [[ "${#typestr}" -gt 0 ]]; do
        case "${typestr:0:1}" in # this looks at the first char of typestr
            c) filetypes+=(-o -name '*.[ch]');;
            C) filetypes+=(-o -name '*.cpp');;
            m) filetypes+=(-o -name '*.make' -o '*.cmake');;
            p) filetypes+=(-o -name '*.py');;
            t) filetypes+=(-o -name '*.txt');;
            ?) echo "Usage: $0 [cCmpt] [find options]" >2
               exit ;;
        esac
        typestr="${typestr:1}" # remove first character, so we can process the remainder
    done
    # Note: at this point filetypes will be something like '-o' -name '*.txt' -o -name '*.[ch]'
    # To use it with find, we need to remove the first element (`-o`), and add parens
    filetypes=( \( "${filetypes[@]:1}" \) )
    shift # and get rid of $1, so it doesn't get passed to `find` later!
fi

# Run `find`
find . \( -name .hg -o -name bld -o -name install \) -prune -o "${filetypes[@]}" "$@"
}

...you could also use a similar approach to building a list of directories to prune, if you wanted to.

As I said, that'd be my preferred option. But there is a trick (and I do mean trick), if you really want to use the variable approach. It's called a magic alias, and it takes advantage of the fact that aliases are expanded before wildcards, but functions are processed afterward, and does something completely unnatural with the combination. Something like this:

alias findhgbldins='shopts="$SHELLOPTS"; set -f; noglob_helper find . \( -name .hg -o -name bld -o -name install \) -prune -o'
noglob_helper() {
    "$@"
    case "$shopts" in
        *noglob*) ;;
        *) set +f ;;
    esac
    unset shopts
}
export SEARCHALL="( -name *.cmake -o -name *.txt -o -name *.[hc] -o -name *.py -o -name *.cpp )"

Then if you run findhgbldins $SEARCHALL -exec egrep -H 'pattern' \;, it expands the alias, records the current shell options, turns off globbing, and passes the find command (including $SEARCHALL, word-split but not glob-expanded) to noglob_helper, which runs the find command with all options, then turns glob expansion back on (if it wasn't disabled in the saved shell options) so it doesn't mess you up later. It's a complete hack, but it should actually work.

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • Your function works, thanks! The only thing I had to change was to add `-name` in front of the file types. E.g. `filetypes+=(-o -name '*.[ch]');;` – NZD Dec 05 '16 at 00:37
  • D'oh! That's what happens when I only sort of test before posting. Anyway, I'm glad it was useful; I've fixed it for the record. – Gordon Davisson Dec 05 '16 at 06:37