Your problem isn't so much reading from the directory to formulate your checks, rather is it a way to generate filenames between two dates to check if any are missing. While bash isn't the fastest when it comes to checking, for checking two months worth of minutes once in a while, it will do.
There are many ways to approach the problem, one of the first that came to mind is just to take the beginning and end filenames as arguments and then generate the filenames between the dates, and then just check whether each file exists, and if not, throw an error. Basically, the utility seq
will generate the needed sequences necessary. There are several other utilities that are a bit more flexible, but seq
is ubiquitous.
Setting up the logic takes a multi-part approach. Basically you need to determine what needs to be tested between the start and end filenames. For instance, if the start and end is less than an hour a part, you only need to check the changing minutes between start/end, etc..
Below, I've given examples of the logic for handling a month worth of files incremented by minutes. I've left the mult-month implementation for you if you find it necessary. If the format changes, just adjust the parameter expansion/substring removal used to parse each part of the datestring. Give it a try:
#!/bin/bash
fstart="$1" # starting filename
fend="$2" # ending filename
## initial trim of filename from left
ystart=${fstart#*_} # start year
mstart=${ystart#*-} # start month
dstart=${mstart#*-} # start day
Hstart=${dstart#*_} # start Hour
Mstart=${fstart##*-} # start Minute
yend=${fend#*_} # end year
mend=${yend#*-} # end month
dend=${mend#*-} # end day
Hend=${dend#*_} # end Hour
Mend=${fend##*-} # end Minute
## final trim of filename from right
ystart=${ystart%%-*}
mstart=${mstart%%-*}
dstart=${dstart%%_*}
Hstart=${Hstart%%-*}
yend=${yend%%-*}
mend=${mend%%-*}
dend=${dend%%_*}
Hend=${Hend%%-*}
## base filename w/o day (e.g. A_2015-01)
fday=${fstart%_*}
fday=${fday%-*}
## check to end of first hour
for M in $(seq -f "%02g" $Mstart 59); do
[ -e ${fstart%_*}_$Hstart-$M ] || printf " missing: %s\n" ${fstart%_*}_$Hstart-$M
# printf " checking: %s\n" ${fstart%_*}_$Hstart-$M
done
## check remaining hours in 1st day
if ((dend > dstart)); then
for H in $(seq -f "%02g" $((Hstart+1)) 23); do
for M in $(seq -f "%02g" 0 59); do
[ -e ${fstart%_*}_$H-$M ] || printf " missing: %s\n" ${fstart%_*}_$H-$M
# printf " checking: %s\n" ${fstart%_*}_$H-$M
done
done
else
for H in $(seq -f "%02g" 0$((Hstart+1)) $((Hend-1))); do
for M in $(seq -f "%02g" 0 59); do
[ -e ${fstart%_*}_$H-$M ] || printf " missing: %s\n" ${fstart%_*}_$H-$M
# printf " checking: %s\n" ${fstart%_*}_$H-$M
done
done
## handle minues in last hour
for M in $(seq -f "%02g" 0 $Mend); do
[ -e ${fstart%_*}_$Hend-$M ] || printf " missing: %s\n" ${fstart%_*}_$Hend-$M
# printf " checking: %s\n" ${fstart%_*}_$Hend-$M
done
printf "check complete\n"
exit 0
fi
## check all hours in full or last day(s) between start/end
if ((dend > (dstart+1))); then ## full days exist before end day
for d in $(seq -f "%02g" $((dstart+1)) $((dend-1))); do
for H in $(seq -f "%02g" 0 23); do
for M in $(seq -f "%02g" 0 59); do
[ -e ${fday}-${d}_$H-$M ] || printf " missing: %s\n" ${fday}-${d}_$H-$M
# printf " checking: %s\n" ${fday}-${d}_$H-$M
done
done
done
else ## next day is last day (time spans < 48 hours)
for H in $(seq -f "%02g" 0 $((Hend-1))); do
for M in $(seq -f "%02g" 0 59); do
[ -e ${fend%_*}_$H-$M ] || printf " missing: %s\n" ${fend%_*}_$H-$M
# printf " checking: %s\n" ${fend%_*}_$H-$M
done
done
## handle minutes in last hour
for M in $(seq -f "%02g" 0 $Mend); do
[ -e ${fend%_*}_$Hend-$M ] || printf " missing: %s\n" ${fend%_*}_$Hend-$M
# printf " checking: %s\n" ${fend%_*}_$Hend-$M
done
printf "check complete\n"
exit 0
fi
## Add Year/Month Iteration
exit 0
Above, you see the test printf
statements commented out. For an example of the filename generation across changing hours, the names generated are:
Example Checks
$ bash filepermin.sh A_2015-01-01_23-50 A_2015-01-02_00-15
checking: A_2015-01-01_23-50
checking: A_2015-01-01_23-51
checking: A_2015-01-01_23-52
checking: A_2015-01-01_23-53
checking: A_2015-01-01_23-54
checking: A_2015-01-01_23-55
checking: A_2015-01-01_23-56
checking: A_2015-01-01_23-57
checking: A_2015-01-01_23-58
checking: A_2015-01-01_23-59
checking: A_2015-01-02_00-00
checking: A_2015-01-02_00-01
checking: A_2015-01-02_00-02
checking: A_2015-01-02_00-03
checking: A_2015-01-02_00-04
checking: A_2015-01-02_00-05
checking: A_2015-01-02_00-06
checking: A_2015-01-02_00-07
checking: A_2015-01-02_00-08
checking: A_2015-01-02_00-09
checking: A_2015-01-02_00-10
checking: A_2015-01-02_00-11
checking: A_2015-01-02_00-12
checking: A_2015-01-02_00-13
checking: A_2015-01-02_00-14
checking: A_2015-01-02_00-15
check complete
Actual Test (with A_2015-01-01_00-31 missing)
As a short test, 120 files were created with:
$ touch A_2015-01-01_00-{00..59}
$ touch A_2015-01-01_01-{00..59}
Deleting A_2015-01-01_00-31
and running the test yielded:
$ bash ../filepermin.sh A_2015-01-01_00-00 A_2015-01-01_01-59
missing: A_2015-01-01_00-31
check complete
Note: there are probably several additional ways to generate the sequences needed. This is jus one example of an approach. Other options are the read all the filenames into a array and to a sequential check of names for any that are more than 1
apart. However, you then run into issues with native file sorting, and the fact that two months worth of minutes is 80K+ filenames. That's getting into the range where bash can get very slow.
Check by Reading Files Into Array
If you were inclined to try reading the files into an array, then with the understanding that native sort order may present a problem, and knowing you can find the files surrounding the missing file, but not precisely the file itself, a much shorter approach can be taken. Simply change to the directory containing the files and try something like:
#!/bin/bash
a=( * )
for ((i = 1; i < ${#a[@]}; i++)); do
n=${a[i]} ## next date
n=${n##*-}
n=${n/#0/}
p=${a[$((i-1))]} ## prev date
p=${p##*-}
p=${p/#0/}
[ $n -eq 0 ] && n=60 ## adjust for test on roll to next hour
(((n - p) != 1)) && echo "file missing prior to ${a[i]}"
done
If any of the next
/ prev
filenames differ by more than 1
, the script will flag a file as missing prior to the current filename. For example removing A_2015-01-01_01-00
from a sequence of files would trigger:
$ bash ../fpm.sh
file missing prior to A_2015-01-01_01-01