0

I've got a bunch of Files with timestamps in the Filename:
A_2015-01-01_00-01 A_2015-01-01_00-02 A_2015-01-01_00-03 A_2015-01-01_00-04

The Folder is full of Files for 2 Months, one file per minute and I was wondering If there's is a quick way to check if there's a Timestamp missing without using for example a dictionary with all Timestamps and run a comparison. So for whenerver a minute entry is missing and so this minute is skipped I'd get the two file surrounding that missing minute or timeframe. I'm new to coding in general and was wondering if something like this is possible with a bash script?

Peter S
  • 625
  • 1
  • 9
  • 32
  • Calculate the number of file expected to be present then run a script to see if there are that number of files starting with Pattern_. – Martin Sep 09 '15 at 18:42
  • thanks that sounds cool but I forgot to mention, that I already know that there are a few entries missing, but I want to know which minute from which day are missing. So my calculations tell me that there are 4 Minutes (4 Files) missing. But there`s a file every Minute/24Hours/ for approx. 2 Month – Peter S Sep 09 '15 at 19:04

5 Answers5

0

Do you want the script to run in real time ?

If yes, then maybe you should consider something like :

  • monitor your folder for new files (use inotifywait
  • for every new file, check if a file exists with the name having -1 minute
Lyes BEN
  • 990
  • 4
  • 14
  • How do you compute "the name having -1 minute" in bash? – melpomene Sep 09 '15 at 18:54
  • This would not detect the case where files stop appearing completely. – melpomene Sep 09 '15 at 18:54
  • no i the files are already there and I've got to look through them to find the (very few) non existing files missing – Peter S Sep 09 '15 at 19:00
  • to compute "the name having -1 minute", you can either use regular expression, or parse the timestamp then get it 1 minute backwards. – Lyes BEN Sep 09 '15 at 19:02
  • "use regular expression, or parse the timestamp then get it 1 minute backwards" - how? – melpomene Sep 09 '15 at 19:02
  • well, your date format is not recognized by linux `date`, otherwise it would be easier. I can give you a python implementation if you can run python – Lyes BEN Sep 09 '15 at 19:35
  • O okay didn't know linux had a problem with that date format, good to know for future projects, yeah i could run python for it, – Peter S Sep 09 '15 at 20:02
0

You provided some examples with different format. Assuming that the real format is A_2015-01-01_00:04 this could help:

#!/bin/bash

START="A_2015-01-01_00:01";
FINISH="A_2015-01-01_00:08";

NEXT_FILE="$START";
[ -f $NEXT_FILE ] || echo "$NEXT_FILE";

while [ "$NEXT_FILE" != "$FINISH" ];do
        TS=$(echo $NEXT_FILE | cut -d "_" -f2- | tr "_" " ");
        NEXT_MIN=$(date -d "$TS 1 minute" "+%Y-%m-%d_%H:%M");
        NEXT_FILE="A_$NEXT_MIN";
        [ -f $NEXT_FILE ] || echo "$NEXT_FILE";
done;

Now, using the format A_2015-01-01_00-04

#!/bin/bash

START="A_2015-01-01_00-01";
FINISH="A_2015-01-01_00-08";

NEXT_FILE="$START";
[ -f $NEXT_FILE ] || echo "$NEXT_FILE";

while [ "$NEXT_FILE" != "$FINISH" ];do
        TS=$(echo "$NEXT_FILE" | cut -d "_" -f2-);
        DAY=$(echo "$TS" | cut -d "_" -f1);
        TIME=$(echo "$TS" | cut -d "_" -f2 | tr "-" ":");

        NEXT_MIN=$(date -d "$DAY $TIME 1 minute" "+%Y-%m-%d_%H-%M");
        NEXT_FILE="A_$NEXT_MIN";

        [ -f $NEXT_FILE ] || echo "$NEXT_FILE";
done;

This will show the missing files between START and FINISH, including both. You just need to define your START and FINSH files. You can modify the script so you can provide those values as parameters.

Alvaro Flaño Larrondo
  • 5,516
  • 2
  • 27
  • 46
  • thanks for the fast reply I changed the Timestamp , sorry for the mistake – Peter S Sep 09 '15 at 19:05
  • OK. I Will update the answer with the modified version – Alvaro Flaño Larrondo Sep 09 '15 at 19:05
  • Thanks a lot. I'll give it a try – Peter S Sep 09 '15 at 19:08
  • I am getting some strange loop when entering it in the OS X Terminal but I think I should get it to work. Thanks a lot for the code. `A_usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]` – Peter S Sep 09 '15 at 19:55
  • I created this using Bash 3.2.25 on a centOS machine. Not sure if it works on an OS X one. It looks like the `date` function works different in OS X – Alvaro Flaño Larrondo Sep 09 '15 at 19:58
  • hmm yeah I`can't get it to work, I tried it on a windows machine with cygwin installed but somehow i thin the TimeFormat is a problem – Peter S Sep 10 '15 at 09:10
  • OSX and Linux use different tools, so `date` is different on each one. You can either modify the `date` command I used (maybe use -f and -j flags) or install GNU date (check [here](http://stackoverflow.com/a/9805125/974822)) – Alvaro Flaño Larrondo Sep 10 '15 at 13:21
0

You can count in which hour you do not have 60 files. When the filenames are constructed exactly as stated in the question, you can use:

ls A_* | cut -d"-" -f1-3 | sort | uniq -c | grep -v " 60 "
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

awk to the rescue!

This following simple script will not recognize the date change but will give you the skipped minutes

$ ls -1 | awk -F- 'p+1!=$NF{print p0, $0} {p=$NF;p0=$0}'
A_2015-01-01_00-04 A_2015-01-01_00-06
A_2015-01-01_00-08 A_2015-01-01_00-10

the directory has these files

$ ls -1
A_2015-01-01_00-01
A_2015-01-01_00-02
A_2015-01-01_00-03
A_2015-01-01_00-04
A_2015-01-01_00-06
A_2015-01-01_00-07
A_2015-01-01_00-08
A_2015-01-01_00-10

otherwise for more robust solution you have to do some calendar calculations to incorporate leap years etc.

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

Your problem isn't so much reading from the directory to formulate your checks, rather is it a way to generate filenames between two dates to check if any are missing. While bash isn't the fastest when it comes to checking, for checking two months worth of minutes once in a while, it will do.

There are many ways to approach the problem, one of the first that came to mind is just to take the beginning and end filenames as arguments and then generate the filenames between the dates, and then just check whether each file exists, and if not, throw an error. Basically, the utility seq will generate the needed sequences necessary. There are several other utilities that are a bit more flexible, but seq is ubiquitous.

Setting up the logic takes a multi-part approach. Basically you need to determine what needs to be tested between the start and end filenames. For instance, if the start and end is less than an hour a part, you only need to check the changing minutes between start/end, etc..

Below, I've given examples of the logic for handling a month worth of files incremented by minutes. I've left the mult-month implementation for you if you find it necessary. If the format changes, just adjust the parameter expansion/substring removal used to parse each part of the datestring. Give it a try:

#!/bin/bash

fstart="$1"  # starting filename
fend="$2"    # ending filename

## initial trim of filename from left
ystart=${fstart#*_}     # start year
mstart=${ystart#*-}     # start month
dstart=${mstart#*-}     # start day
Hstart=${dstart#*_}     # start Hour
Mstart=${fstart##*-}    # start Minute

yend=${fend#*_}     # end year
mend=${yend#*-}     # end month
dend=${mend#*-}     # end day
Hend=${dend#*_}     # end Hour
Mend=${fend##*-}    # end Minute

## final trim of filename from right
ystart=${ystart%%-*}
mstart=${mstart%%-*}
dstart=${dstart%%_*}
Hstart=${Hstart%%-*}

yend=${yend%%-*}
mend=${mend%%-*}
dend=${dend%%_*}
Hend=${Hend%%-*}

## base filename w/o day (e.g. A_2015-01)
fday=${fstart%_*}
fday=${fday%-*}

## check to end of first hour
for M in $(seq -f "%02g" $Mstart 59); do
    [ -e ${fstart%_*}_$Hstart-$M ] || printf " missing: %s\n" ${fstart%_*}_$Hstart-$M
    # printf " checking: %s\n" ${fstart%_*}_$Hstart-$M
done

## check remaining hours in 1st day
if ((dend > dstart)); then
    for H in $(seq -f "%02g" $((Hstart+1)) 23); do
        for M in $(seq -f "%02g" 0 59); do
            [ -e ${fstart%_*}_$H-$M ] || printf " missing: %s\n" ${fstart%_*}_$H-$M
            # printf " checking: %s\n" ${fstart%_*}_$H-$M
        done
    done
else
    for H in $(seq -f "%02g" 0$((Hstart+1)) $((Hend-1))); do
        for M in $(seq -f "%02g" 0 59); do
            [ -e ${fstart%_*}_$H-$M ] || printf " missing: %s\n" ${fstart%_*}_$H-$M
            # printf " checking: %s\n" ${fstart%_*}_$H-$M
        done
    done
    ## handle minues in last hour
    for M in $(seq -f "%02g" 0 $Mend); do
        [ -e ${fstart%_*}_$Hend-$M ] || printf " missing: %s\n" ${fstart%_*}_$Hend-$M
        # printf " checking: %s\n" ${fstart%_*}_$Hend-$M
    done
    printf "check complete\n"
    exit 0
fi

## check all hours in full or last day(s) between start/end
if ((dend > (dstart+1))); then  ## full days exist before end day
    for d in $(seq -f "%02g" $((dstart+1)) $((dend-1))); do
        for H in $(seq -f "%02g" 0 23); do
            for M in $(seq -f "%02g" 0 59); do
                [ -e ${fday}-${d}_$H-$M ] || printf " missing: %s\n" ${fday}-${d}_$H-$M
                # printf " checking: %s\n" ${fday}-${d}_$H-$M
            done
        done

    done
else    ## next day is last day (time spans < 48 hours)
    for H in $(seq -f "%02g" 0 $((Hend-1))); do
        for M in $(seq -f "%02g" 0 59); do
            [ -e ${fend%_*}_$H-$M ] || printf " missing: %s\n" ${fend%_*}_$H-$M
            # printf " checking: %s\n" ${fend%_*}_$H-$M
        done
    done
    ## handle minutes in last hour
    for M in $(seq -f "%02g" 0 $Mend); do
        [ -e ${fend%_*}_$Hend-$M ] || printf " missing: %s\n" ${fend%_*}_$Hend-$M
        # printf " checking: %s\n" ${fend%_*}_$Hend-$M
    done
    printf "check complete\n"
    exit 0    
fi

## Add Year/Month Iteration

exit 0

Above, you see the test printf statements commented out. For an example of the filename generation across changing hours, the names generated are:

Example Checks

$ bash filepermin.sh A_2015-01-01_23-50 A_2015-01-02_00-15
 checking: A_2015-01-01_23-50
 checking: A_2015-01-01_23-51
 checking: A_2015-01-01_23-52
 checking: A_2015-01-01_23-53
 checking: A_2015-01-01_23-54
 checking: A_2015-01-01_23-55
 checking: A_2015-01-01_23-56
 checking: A_2015-01-01_23-57
 checking: A_2015-01-01_23-58
 checking: A_2015-01-01_23-59
 checking: A_2015-01-02_00-00
 checking: A_2015-01-02_00-01
 checking: A_2015-01-02_00-02
 checking: A_2015-01-02_00-03
 checking: A_2015-01-02_00-04
 checking: A_2015-01-02_00-05
 checking: A_2015-01-02_00-06
 checking: A_2015-01-02_00-07
 checking: A_2015-01-02_00-08
 checking: A_2015-01-02_00-09
 checking: A_2015-01-02_00-10
 checking: A_2015-01-02_00-11
 checking: A_2015-01-02_00-12
 checking: A_2015-01-02_00-13
 checking: A_2015-01-02_00-14
 checking: A_2015-01-02_00-15
check complete

Actual Test (with A_2015-01-01_00-31 missing)

As a short test, 120 files were created with:

$ touch A_2015-01-01_00-{00..59}
$ touch A_2015-01-01_01-{00..59}

Deleting A_2015-01-01_00-31 and running the test yielded:

$ bash ../filepermin.sh A_2015-01-01_00-00 A_2015-01-01_01-59
 missing: A_2015-01-01_00-31
check complete

Note: there are probably several additional ways to generate the sequences needed. This is jus one example of an approach. Other options are the read all the filenames into a array and to a sequential check of names for any that are more than 1 apart. However, you then run into issues with native file sorting, and the fact that two months worth of minutes is 80K+ filenames. That's getting into the range where bash can get very slow.

Check by Reading Files Into Array

If you were inclined to try reading the files into an array, then with the understanding that native sort order may present a problem, and knowing you can find the files surrounding the missing file, but not precisely the file itself, a much shorter approach can be taken. Simply change to the directory containing the files and try something like:

#!/bin/bash

a=( * )
for ((i = 1; i < ${#a[@]}; i++)); do 

    n=${a[i]}               ## next date
    n=${n##*-}
    n=${n/#0/}

    p=${a[$((i-1))]}        ## prev date
    p=${p##*-}
    p=${p/#0/}

    [ $n -eq 0 ] && n=60    ## adjust for test on roll to next hour

    (((n - p) != 1)) && echo "file missing prior to ${a[i]}"

done

If any of the next / prev filenames differ by more than 1, the script will flag a file as missing prior to the current filename. For example removing A_2015-01-01_01-00 from a sequence of files would trigger:

$ bash ../fpm.sh
file missing prior to A_2015-01-01_01-01
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85