-1

I am writing a generic shell script which filters out files based on given regex.

My shell script:

files=$(find $path -name $regex)

In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:

20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS

I am unable to arrive at the correct regex.

I am able to get the path of the files inside the folder using the regex '*data.txt', as I know the name of the file inside it.

But it gives me the full path of the file, something like

/path/20161128-20:34:33:432813246/data.txt

What I want is simply:

/path/20161128-20:34:33:432813246

Please help me in identifying the correct regex for my requirement

NOTE:

I know how to process the data after

files=$(find $path -name $regex)

But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.

Community
  • 1
  • 1

2 Answers2

1
  • Per POSIX, find's -name -path primaries (tests) use patterns (a.k.a wildcard expressions, globs) to match filenames and pathnames (while patterns and regular expressions are distantly related, their syntax and capabilities differ significantly; in short: patterns are syntactically simpler, but far less powerful).

    • -name and matches the pattern against the basename (mere filename) part of an input path only
    • -path matches the pattern against the whole pathname (the full path)
  • Both GNU and BSD/macOS find implement nonstandard extensions:

    • -iname and -ipath, which work like their standard-compliant counterparts (based on patterns), except that they match case-insensitively.
    • -regex and -iregex tests for matching pathnames by regex (regular expression).
      • Caveat: Both implementations offer at least 2 regex dialects to choose from (-E activates support for extended regular expressions in BSD find, and GNU find allows selecting from several dialects with-regextype, but no two dialects are exactly the same across the two implementations - see bottom for the gory details.

With your folder names following a fixed-width naming scheme, a pattern would work:

pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Of course, you can take a shortcut if you don't expect false positives:

pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'

Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

If we put it all together:

files=$(find "$path" -type d -name "$pattern")
  • It's important to double-quote the variable references to protect their values from unwanted shell expansions, notably to preserve any whitespace in the path and to prevent premature globbing by the shell of value $pattern.

  • Note that I've added -type d to limit matching to directories (folders), which improves performance.


Optional background information:

Below is a regex feature matrix as of GNU find v4.6.0 / BSD find as found on macOS 10.12.1:

  • GNU find features are listed by the types supported by the -regextype option, with emacs being the default.

    • Note that several posix-*-named regex types are misnomers in that they support features beyond what POSIX mandates.
  • BSD find features are listed by basic (using NO regex option, which implies platform-flavored BREs) and extended (using option -E, which implies platform-flavored EREs).

For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended with GNU find and using -E with BSD find is safe, but note that not all features you may expect will be supported, notably \b, \</\> and character class shortcuts such as \d.

=================== GNU find ===================
== REGEX FEATURE: \{\}
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: {}
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \+
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        ✓
== REGEX FEATURE: +
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \b
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \< \>
TYPE: awk:                                        -
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: [:digit:]
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         ✓
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       ✓
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                ✓
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        ✓
TYPE: sed:                                        ✓
== REGEX FEATURE: \d
TYPE: awk:                                        -
TYPE: egrep:                                      -
TYPE: ed:                                         -
TYPE: emacs:                                      -
TYPE: gnu-awk:                                    -
TYPE: grep:                                       -
TYPE: posix-awk:                                  -
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                -
TYPE: posix-extended:                             -
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
== REGEX FEATURE: \s
TYPE: awk:                                        ✓
TYPE: egrep:                                      ✓
TYPE: ed:                                         -
TYPE: emacs:                                      ✓
TYPE: gnu-awk:                                    ✓
TYPE: grep:                                       -
TYPE: posix-awk:                                  ✓
TYPE: posix-basic:                                -
TYPE: posix-egrep:                                ✓
TYPE: posix-extended:                             ✓
TYPE: posix-minimal-basic:                        -
TYPE: sed:                                        -
=================== BSD find ===================
== REGEX FEATURE: \{\}
TYPE: basic:                                      ✓
TYPE: extended:                                   -
== REGEX FEATURE: {}
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \+
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: +
TYPE: basic:                                      -
TYPE: extended:                                   ✓
== REGEX FEATURE: \b
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \< \>
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: [:digit:]
TYPE: basic:                                      ✓
TYPE: extended:                                   ✓
== REGEX FEATURE: \d
TYPE: basic:                                      -
TYPE: extended:                                   -
== REGEX FEATURE: \s
TYPE: basic:                                      -
TYPE: extended:                                   ✓
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • worked like a charm!!, it works even without `-type d` –  Nov 29 '16 at 14:52
  • @ADPK: Glad to hear it; yes, if the name pattern is specific enough, you don't strictly need `-type d`, but adding it will help performance, because `find` will then only look at directories. – mklement0 Nov 29 '16 at 14:55
-1

When you have a full path of a file, then you don't need a regex to extract the directory name.

dirname "/path/20161128-20:34:33:432813246/data.txt" 

will give you

/path/20161128-20:34:33:432813246

If you really want a regex, try this:

\d{8}-\d{2}:\d{2}:\d{2}:\d{9}
pkalinow
  • 1,619
  • 1
  • 17
  • 43
  • 2
    I feel this cannot be the answer, as the question was more specific on arriving at the wildcard or correct -regex. – Deepak Nov 29 '16 at 14:30
  • Given that the OP did not show any attempt at defining a regular expression in the question, it would not surprise me if this is exactly what they were looking for. – chepner Nov 29 '16 at 14:34
  • 1
    The OP is using `'*data.txt'` as an _incidental example_ of a pattern that happens to work, but the intent is clearly to match the directory names directly, and to have `find` output the directory paths directly (as opposed to having to apply `dirname` after the fact). As for your regex: (a) Neither GNU nor BSD `find` support character-class shortcut `\d`, and (b) for use with `-regex`, the _entire path_ must be matched, not just the filename part. – mklement0 Nov 30 '16 at 03:06