3

My script (in bash) aims to do this job:

  1. gets start and stop time from a file, file_A. The time range usually is 3-24 hours.

  2. Based on this time window of [start_time, stop_time] got from file_A, I need to find specific files among totally 10k log files(and will increase along with the experimental running), each of which recorded around 30 minutes. That's to say, I have to find 6-50 log files among 10k ones.

  3. After confirming the correct log files, I need to print out interesting data.

Step 1) and 3) are OK, I did it already. Right now, I'm stuck in step 2), Especially in two places:

(a). How to select appropriate files efficiently by their names since the log files named as time. Every log file named as log_201305280650 which means 2013 / May 28 / 06 :50. That's to say, according to the time get from file_A, I need to confirm the corresponding log files by their names which is a hint of time.

(b). Once the files are selected, read the items(like temperature, pressure etc) from this file whose time is inside of time window. Because each file records 30 minutes, which means some of the entry in this file can't satisfy time window.

For instance,

From step 1), My time window is set to [201305280638, 201305290308].

from step 2), I know the log file(log_201305280650) contains the start time of 201305280638. So I need to read all of temperature and pressure for the entries below 201305280638.

    the log files name is log_201305280650 (= 2013 / May 28 / 06 :50)

    Time                      temperature  pressure ...
    201305280628                100,         120  ...
    201305280629                100,         120  ...

   ...              ...     ...

    201305280638                101,         121  ...
    201305280639                99,          122  ...

     ...             ...     ... 

    201305280649                101,         119  ...
    201305280650                102,         118  ...

My fake script is following.

get time_start from /path/file_A
get time_stop  from /path/file_A
for file in /path_to_log_files/*
do
case "$file" in
*)        
     If [[log file name within time window of (time_start, time_stop)]]; then
     loop over this file to get the entry whose time is just within (time_start, time_stop)
     read out temperature and pressure etc.
fi
esac
done
Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
user2740039
  • 105
  • 6
  • 1
    You might want to read the [formatting help](http://stackoverflow.com/editing-help) and clean this up a bit more. It's not at all clear what your actual question is. – millimoose Sep 08 '13 at 23:15
  • Why does your `case` statement have `*)` twice? – Barmar Sep 08 '13 at 23:19
  • 2
    TMI. You'd be more likely to get help if you provide a SSCCE: http://sscce.org. – EJK Sep 09 '13 at 00:49
  • Indeed, it's not clear enough. But I somehow tried my best to make it clear. – user2740039 Sep 09 '13 at 07:31
  • 1
    Hi, guys : Thanks a lot for your attention! I have modified a lot to my post. Hopefully it looks much more clear than before. Best ! – user2740039 Sep 09 '13 at 08:21

3 Answers3

0

Quite a job using bash. Perl or python would have been easier, they both have date/time modules.

I spent a while doing the usual date slicing and it was horrible, so I cheated and used file timestamps instead. Bash has some limited timestamp checking, and this uses that. OK, it does some file IO, but these are empty files and what the hell!

lower=201305280638
upper=201305290308
filename=log_201305280638
filedate=${filename:4}

if (( filedate == upper )) || (( filedate == lower ))
then
    echo "$filename within range"
else
    # range files
    touch -t $lower lower.$$
    touch -t $upper upper.$$

    # benchmark file
    touch -t $filedate file.$$

    if [[ file.$$ -nt $upper ]]
    then
        echo "$filename is too young"

    elif [[ file.$$ -ot $lower ]]
    then
        echo "$filename is too old"
    else
        echo "$filename is just right"
    fi

    rm lower.$$ upper.$$ file.$$
fi

-nt is "newer-than"

-ot is "older-than"

Hence the check for equality at the start. You can use a similar check for the timestamps within the file (your second issue). But honestly, can't you use perl or python?

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • Hi, @cdarke, thanks a lot for your response and time first! To be honest, I'm not eligible to evaluate your code due to my limited level in bash. However, according to this link, http://stackoverflow.com/questions/78493/what-does-mean-in-the-shell. Most of the guys don't encourage to use "$$" - I'm not saying your code is bad - I can't judge indeed. But, it's obvious to see that you actually suggest me strongly use python or perl. I will consider this seriously and try to figure out a best solution(I have a tiny background in Python). Anyway, thanks a lot for you again ! – user2740039 Sep 09 '13 at 19:21
  • @user2740039: OK, but using $$ makes the code simpler and personally I think it is justified in this case. You don't have to use it. The point is that your timestamp format is exactly right for `touch`, and it seems a shame to waste that coincidence (if it is one). – cdarke Sep 10 '13 at 08:30
0

Maybe something along the lines of this would work for you? I am using $start and $end for the start and end times from file_A. I

 eval cat log_{$start..$end} 2> /dev/null | sort -k1 | sed -n "/$start/,/$end/p"

This assumes that your log files are in the format

time temperature pressure ...

with no headers or other such text

Aaron Okano
  • 2,243
  • 1
  • 12
  • 5
0

It may be easier to use awk and the +"%s" option of the date command in stead of literal date and time. This option converts date/time to seconds from epoch (01-01-1970). The resulting number is easy to work with. After all, it's just a number. As an example I made a small bash script. First, a simulation:

#!/bin/bash

#simulation: date and time
start_dt="2013-09-22 00:00:00"
end_dt="2013-09-22 00:00:00"
start_secs=$(date -d "start_dt" +"%s")
end_secs=$(date -d "end_dt" +"%s")
#simulation: set up table (time in secs, temperature, pressure per minute)
> logfile
for ((i=$start_secs;i<$end_secs;i=i+60)); do
    echo $i $[90+$[RANDOM %20]] $[80+$[RANDOM %30]] >> logfile
done

Here's the actual script to get the user range and to print it out:

echo "Enter start of range:"
read -p "Date (YYYY-MM-DD): "sdate
read -p "Time (HH:MM:SS)  : "stime
echo "Enter end of range:"
read -p "Date (YYYY-MM-DD): "edate
read -p "Time (HH:MM:SS)  : "etime
#convert to secs
rstart=$(date -d "$sdate $stime" +"%s")
rend=$(date -d "$edate $etime" +"%s")
#print it to screen
awk -v rstart=$rstart -v rend=$rend '{if($1 >= rstart && $1 <= rend)print $0}' logfile

The awk command is very suited for this. It is fast and can handle large files. I hope this gives you ideas.

linuph
  • 83
  • 3
  • 6