2

I am new to bash scripting, and I'm having an issue with one of my scripts. I'm trying to compose a list of Drivers Under 25 after reading their birthdates in from a folder filled with XML files and calculating their ages. Once I have determined they are under 25, the filename of the driver's data is saved to a text file. The script is working up until a certain point and then it stops. The error I'm getting is:

gdate: extra operand ‘+%s’
Try 'gdate --help' for more information.
DriversUnder25.sh: line 24: ( 1471392000 -  )/60/60/24 : syntax error: operand expected (error token is ")/60/60/24 ")

Here is my code:

#!/bin/bash

# define directory to search and current date
DIRECTORY="/*.xml"
CURRENT_DATE=$(date '+%Y%m%d')

# loop over files in a directory
for FILE in $DIRECTORY;
do
  # grab user's birth date from XML file
  BIRTH_DATE=$(sed -n '/Birthdate/{s/.*<Birthdate>//;s/<\/Birthdate.*//;p;}' $FILE)

  # calculate the difference between the current date
  # and the user's birth date (seconds)
  DIFFERENCE=$(( ( $(gdate -ud $CURRENT_DATE +'%s') - $(gdate -ud $BIRTH_DATE +'%s') )/60/60/24 ))

  # calculate the number of years between
  # the current date and the user's birth date
  YEARS=$(($DIFFERENCE / 365))

  # if the user is under 25
  if [ "$YEARS" -le 25 ]; then
    # save file name only
    FILENAME=`basename $FILE`
    # output filename to text file
    echo $FILENAME >> DriversUnder25.txt
  fi
done

I'm not sure why it correctly outputs the first 10 filenames and then stops. Any ideas why this may be happening?

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
kbdev
  • 1,225
  • 1
  • 13
  • 33
  • check the value of `$BIRTH_DATE`. You can just echo it to stderr to see it with `echo birthdate is \"${BIRTH_DATE}\"" 1>&2` – Tim B Aug 17 '16 at 13:52
  • The result I get is `birthdate is "19920215"` – kbdev Aug 17 '16 at 14:01
  • use `set -x` to see how variable values are being expanded. You should see on your 10th+ record that the data is changing. Is it possible there is something wrong with your set of XML files, where the date format is different or other problems? Good luck. – shellter Aug 17 '16 at 14:06
  • And, BTW, much better than average shell Q. Keep posting! – shellter Aug 17 '16 at 14:10
  • 1
    I think I have figured out why this is failing. Some XML files have multiple drivers in them, and thus are returning multiple birthdates. I'll have to figure out how to handle this. Thanks for your help! – kbdev Aug 17 '16 at 14:12
  • Trying to use `sed` to extract content from XML is the Wrong Thing to start with. A tool such as `xmlstarlet` is going to be *far* better behaved (aware of comments, CDATA sections, namespaces, etc etc etc). – Charles Duffy Aug 17 '16 at 14:57
  • ...if you provided the format of our XML, one could potentially provide an answer showing how to address a specific driver. – Charles Duffy Aug 17 '16 at 15:14
  • @CharlesDuffy I was hoping to not use another dependency, in case someone else on my team needs to use this code. Also, I've only been writing shell scripts for 2 days, so I wasn't aware of this tool. Thanks for the suggestion; I will keep it in mind for the future. – kbdev Aug 17 '16 at 15:19
  • If you want to post your own answer, use the "Add an Answer" button to do so -- that way it can be separately commented on, voted on, etc -- rather than editing it into the question. – Charles Duffy Aug 17 '16 at 15:21
  • ...as for dependency management, XMLStarlet can generate XSLT templates, which can be applied by anyone with `xsltproc` -- which just about every modern Unixlike (MacOS X included) ships with out-of-the-box. Another common approach for folks trying to minimize dependencies is to use the Python interpreter's standard-library XML modules -- it's typically straightforward to implement a shell function that wraps a very short Python script for whatever you're trying to accomplish. – Charles Duffy Aug 17 '16 at 15:23
  • btw -- see [DontReadLinesWithFor](http://mywiki.wooledge.org/DontReadLinesWithFor), and consider running your code through http://shellcheck.net/ – Charles Duffy Aug 17 '16 at 15:26

3 Answers3

3

You need to quote the expansion of $BIRTH_DATE to prevent word splitting on the whitespace in the value. (It is good practice to quote all your parameter expansions, unless you have a good reason not to, for this very reason.)

DIFFERENCE=$(( ( $(gdate -ud "$CURRENT_DATE" +'%s') - $(gdate -ud "$BIRTH_DATE" +'%s') )/60/60/24 ))

(Based on your comment, this would probably at least allow gdate to give you a better error message.)

chepner
  • 497,756
  • 71
  • 530
  • 681
1

A best-practices implementation would look something like this:

directory=/ # patch as appropriate
current_date_unix=$(date +%s)

for file in "$directory"/*.xml; do
    while IFS= read -r birth_date; do
        birth_date_unix=$(gdate -ud "$birth_date" +'%s')
        difference=$(( ( current_date_unix - birth_date_unix ) / 60 / 60 / 24 ))
        years=$(( difference / 365 ))
        if (( years < 25 )); then
            echo "${file%.*}"
        fi
    done < <(xmlstarlet sel -t -m '//Birthdate' -v . -n <"$file")
done >DriversUnder25.txt

If this script needs to be usable my folks who don't have xmlstarlet installed, you can generate an XSLT template and then use xsltproc (which is available out-of-the-box on modern opertaing systems).

That is to say, if you run this once, and bundle its output with your script:

xmlstarlet sel -C -t -m '//Birthdate' -v . -n  >get-birthdays.xslt

...then the script can be modified to replace xmlstarlet with:

xsltproc get-birthdays.xslt - <"$file"

Notes:

  • The XML input files are being read with an actual XML parser.
  • When expanding for file in "$directory"/*.xml, the expansion is quoted but the glob is not (thus allowing the script to operate on directories with spaces, glob characters, etc. in their names).
  • The output file is being opened once, for the loop, rather than once per line of output (reducing overhead unnecessarily opening and closing files).
  • Lower-case variable names are in use to comply with POSIX conventions (specifying that variables with meaning to the operating system and shell have all-upper-case names, and that the set of names with at least one lower-case character is reserved for application use; while the docs in question are with respect to environment variables, shell variables share a namespace, making the convention relevant).
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
0

The issue was that there were multiple drivers in some files, thus importing multiple birth dates into the same string. My solution is below:

#!/bin/bash

# define directory to search and current date
DIRECTORY="/*.xml"
CURRENT_DATE=$(date '+%Y%m%d')

# loop over files in a directory
for FILE in $DIRECTORY;
do
  # set flag for output to false initially
  FLAG=false

  # grab user's birth date from XML file
  BIRTH_DATE=$(sed -n '/Birthdate/{s/.*<Birthdate>//;s/<\/Birthdate.*//;p;}' $FILE)

  # loop through birth dates in file (there can be multiple drivers)
  for BIRTHDAY in $BIRTH_DATE;
  do
    # calculate the difference between the current date
    # and the user's birth date (seconds)
    DIFFERENCE=$(( ( $(gdate -ud $CURRENT_DATE +'%s') - $(gdate -ud $BIRTHDAY +'%s') )/60/60/24))

    # calculate the number of years between
    # the current date and the user's birth date
    YEARS=$(($DIFFERENCE / 365))

    # if the user is under 25
    if [ "$YEARS" -le 25 ]; then
      # save file name only
      FILENAME=`basename $FILE`
      # set flag to true (driver is under 25 years of age)
      FLAG=true
    fi
  done

  # if there is a driver under 25 in the file
  if $FLAG == true; then
    # output filename to text file
    echo $FILENAME >> DriversUnder25.txt
  fi
done
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
kbdev
  • 1,225
  • 1
  • 13
  • 33
  • 2
    Perhaps you mean `if [ "$FLAG" = true ]`? If `FLAG` is guaranteed to be either `true` or `false` this will work as-written, but it works for a somewhat-accidental reason (because either of these commands ignores its arguments and returns an exit status reflecting its name). Thus, `if $FLAG == false` or `if $FLAG foobar` will behave exactly the same way as `if $FLAG == true`. – Charles Duffy Aug 17 '16 at 15:48
  • (...after the self-accept timeout expires, btw, you can mark the question solved by clicking the checkbox by this answer). – Charles Duffy Aug 17 '16 at 15:50