4

I'm looking for examples of specifying files in a tree structure, for example, for specifying the set of files to search in a grep tool. I'd like to be able to include and exclude files and directories by name matches. I'm sure there are examples out there, but I'm having a hard time finding them.

Here's an example of a possible syntax:

*.py *.html
*.txt *.js
-*.pyc
-.svn/
-*combo_*.js

(this would mean include file with extensions .py .html .txt .js, exclude .pyc files, anything under a .svn directory, and any file matching combo_.js)

I know I've seen these sorts of specifications in other tools before. Is this ringing any bells for anyone?

cjhines
  • 1,148
  • 3
  • 16
  • 32
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • the way to specify such lists is _heavily_ dependend on the tool you are intending to use. as such, your question doesn't actually make much sense, unless with 'a grep tool' you mean a specific tool..? –  Dec 27 '08 at 00:55
  • i just found your comment below, that you want to write your own script... the choice of syntax is completely up to you in this case, but i'll give an answer with one place where you can copy from. –  Dec 27 '08 at 00:58

7 Answers7

4

There is no single standard format for this kind of thing, but if you want to copy something that is widely recognized, have a look at the rsync documentation. Look at the chapter on "INCLUDE/EXCLUDE PATTERN RULES."

  • Thanks, hop. This may be the original source I was remembering. It's certainly expressive enough for my needs, maybe too much so! – Ned Batchelder Dec 27 '08 at 13:09
2

Apache Ant provides 'ant globs or patterns where:

**/foo/**/*.java

means "any file ending in '.java' in a directory which includes a directory named 'foo' in its path" -- including ./foo/X.java

tgdavies
  • 10,307
  • 4
  • 35
  • 40
1

How about find in unixish environments?

Find can, of course, do more than build a list of files, but that is one of the common ways it is used. From the man page:

NAME find -- walk a file hierarchy

SYNOPSIS find [-H | -L | -P] [-EXdsx] [-f pathname] pathname ... expression find [-H | -L | -P] [-EXdsx] -f pathname [pathname ...] expression

DESCRIPTION The find utility recursively descends the directory tree for each pathname listed, evaluating an expression (composed of the primaries'' andoperands'' listed below) in terms of each file in the tree.

to achieve your goal I would write something like (formatted for readability):

find ./ \( -name *.{py,html,txt,js,pyc} -or \
           -name *combo_*.js -or \
           \( -name *.svn -and -type d\)\) \
           -print

Moreover there is a idomatic pattern using xargs which makes find suitable for sending the whole list so constructed to an arbitrary command as in:

find /path -type f -print0 | xargs -0 rm
dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
1

In your example syntax, is it implicitly understood that there's an escaping character so that you can explicitly include a file that begins with a dash? (The same question goes for any other wildcard characters, but I suppose I'd expect to see more files with dashes in their names than asterisks.)

Various command shells use * (and possibly ? to match a single char), as in your example, but they generally only match against a string of characters that doesn't include a path component separator (i.e. '\' on Windows systems, '/' elsewhere). I've also seen such source control apps as Perforce use additional patterns that can match against path component separators. For instance, with Perforce the pattern "foo/...ext" (without quotes) will match all files under the foo/ directory structure that end with "ext", regardless of whether they are in foo/ itself or in one of its descendant directories. This seems to be a useful pattern.

reuben
  • 3,360
  • 23
  • 28
1

If you're using bash, you can use the extglob extension to get some nice globbing functions. Enable it as follows:

shopt -s extglob

Then you can do things like the following:

# everything but .html, .jpg or ,gif files
ls -d !(*.html|*gif|*jpg)
# list file9, file22 but not fileit
ls file+([0-9])
# begins with apl or un only
ls -d +(apl*|un*)

See also this page.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
0

find(1) is a fine tool as described in the previous answer but if it gets more complicated, you should consider either writing your own script in any of the usual suspects (Ruby, Perl, Python et al.) or try to use one of the more powerful shells such as zsh which has a ** globbing commands and you can specify things to exclude. The latter is probably more complicated though.

Keltia
  • 14,535
  • 3
  • 29
  • 30
  • I'm intending to write a script to do the processing. What I'm looking for is the syntax for a file that the script can read to determine what files to operate on. – Ned Batchelder Dec 26 '08 at 22:19
  • Most of the current version control systems such as Mercurial, Subversion and CVS have the concept of the *ignore file (`.hgignore`, `.svnignore`, and so on.). Most of them use the globbing syntax although Mercurial can be switch to use full regexp. – Keltia Dec 26 '08 at 22:36
  • You should not reference "the previous answer", as answers are not sorted by date by default – Sparr Dec 26 '08 at 23:39
0

You might want to check out ack, which allows you to specify file types to search in with options like --perl, etc.

It also ignores .svn directories by default, as well as core dumps, editor cruft, binary files, and so on.