0

I'm hoping this is a simple question, since I've never done shell scripting before. I'm trying to filter certain files out of a list of results. While the script executes and prints out a list of files, it's not filtering out the ones I don't want. Thanks for any help you can provide!

#!/bin/bash

# Purpose: Identify all *md files in H2 repo where there is no audit date
#
#
#
# Example call: no_audits.sh
#
# If that call doesn't work, try ./no_audits.sh
#
# NOTE: Script assumes you are executing from within the scripts directory of
#       your local H2 git repo.
#
# Process:
# 1) Go to H2 repo content directory (assumption is you are in the scripts dir)
# 2) Use for loop to go through all *md files in each content sub dir
#    and list all file names and directories where audit date is null
#

#set counter
count=0

# Go to content directory and loop through all 'md' files in sub dirs
cd ../content

FILES=`find .  -type f -name '*md' -print`

for f in $FILES
do

   if [[ $f == "*all*" ]] || [[ $f == "*index*" ]] ;
   then
      # code to skip
      echo " Skipping file:  " $f
      continue
   else
   # find audit_date in file metadata
   adate=`grep audit_date $f`

   # separate actual dates from rest of the grepped line
   aadate=`echo $adate | awk -F\' '{print $2}'`

   # if create date is null - proceed
      if [[ -z "$aadate" ]] ;
      then

         # print a list of all files without audit dates
         echo "Audit date: " $aadate " " $f;
         count=$((count+1));
      fi
   fi
done
echo $count " files without audit dates "
Kate D.
  • 23
  • 3
  • Add expected output... – Gilles Quénot Mar 14 '18 at 21:35
  • 1
    When building a StackOverflow question, please try to create the shortest code that generates a specific problem. In this case, that might be done by tracing your code with `bash -x yourscript`, watching each command it runs, finding the first command that doesn't behave the way you expect, and asking only about that one command. See also the documentation on building a [mcve]. – Charles Duffy Mar 14 '18 at 21:35
  • 3
    That said, there's also a number of antipatterns here. Consider running your code through http://shellcheck.net/; and also see [BashPitfalls #1](http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29) / [DontReadLinesWithFor](http://mywiki.wooledge.org/DontReadLinesWithFor) -- and, for the practices you *should* follow, [UsingFind](http://mywiki.wooledge.org/UsingFind). The current code will misbehave badly if any filenames contain whitespace; can be interpreted as globs; etc. – Charles Duffy Mar 14 '18 at 21:36
  • Immediate issue is that the quotes in `[[ $f == "*all*" ]]` mean you're excluding only files named `*all*`, not files with `all` in their name. For the latter, it would be `[[ $f = *all* ]]`, without the quotes. And yes, this should be done with a better `find` expression rather than postprocessing in bash at all. – Charles Duffy Mar 14 '18 at 21:46

2 Answers2

2

First, to address the immediate issue:

[[ $f == "*all*" ]]

is only true if the exact contents of f is the string *all* -- with the wildcards as literal characters. If you want to check for a substring, then the asterisks shouldn't be quoted:

[[ $f = *all* ]]

...is a better-practice solution. (Note the use of = rather than == -- this isn't essential, but is a good habit to be in, as the POSIX test command is only specified to permit = as a string comparison operator; if one writes [ "$f" == foo ] by habit, one can get unexpected failures on platforms with a strictly compliant /bin/sh).


That said, a ground-up implementation of this script intended to follow best practices might look more like the following:

#!/usr/bin/env bash
count=0
while IFS= read -r -d '' filename; do
  aadate=$(awk -F"'" '/audit_date/ { print $2; exit; }' <"$filename")
  if [[ -z $aadate ]]; then
    (( ++count ))
    printf 'File %q has no audit date\n' "$filename"
  else
    printf 'File %q has audit date %s\n' "$filename" "$aadate"
  fi
done < <(find . -not '(' -name '*all*' -o -name '*index*' ')' -type f -name '*md' -print0)
echo "Found $count files without audit dates" >&2

Note:

  • An arbitrary list of filenames cannot be stored in a single bash string (because all characters that might otherwise be used to determine where the first name ends and the next name begins could be present in the name itself). Instead, read one NUL-delimited filename at a time -- emitted with find -print0, read with IFS= read -r -d ''; this is discussed in [BashFAQ #1].
  • Filtering out unwanted names can be done internal to find.
  • There's no need to preprocess input to awk using grep, as awk is capable of searching through input files itself.
  • < <(...) is used to avoid the behavior in BashFAQ #24, wherein content piped to a while loop causes variables set or modified within that loop to become unavailable after its exit.
  • printf '...%q...\n' "$name" is safer than echo "...$name..." when handling unknown filenames, as printf will emit printable content that accurately represents those names even if they contain unprintable characters or characters which, when emitted directly to a terminal, act to modify that terminal's configuration.
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • `<` seems unnecessary for the input file to `awk`? – Mark Setchell Mar 14 '18 at 22:08
  • @MarkSetchell, true, it'd be perfectly legal to just pass the filename as a separate argument. The argument is that this way we can handle even unusual filenames without a `--` sigil... but as such names are guaranteed to be prefixed with `./` when they're coming from `find`, and that a strictly POSIX-compliant command-line parser will treat anything after the first positional argument as positional anyhow, either approach is perfectly fine (I tend to be wary since GNU's command-line parsers often aren't strictly POSIX-compliant). – Charles Duffy Mar 14 '18 at 22:10
0

Nevermind, I found the answer here:

bash script to check file name begins with expected string

I tried various versions of the wildcard/filename and ended up with:

if [[ "$f" == *all.md ]] || [[ "$f" == *index.md ]] ;

The link above said not to put those in quotes, and removing the quotes did the trick!

Kate D.
  • 23
  • 3
  • I believe that issue was also identified [in a comment](https://stackoverflow.com/questions/49287970/bash-script-not-filtering#comment85579357_49287970) ~20 minutes prior to this answer's posting. – Charles Duffy Mar 14 '18 at 22:17