1

I have a directory which contains access_log files for an altassian product. The files are named as access_log.2017-11-02 an example. I have log parser written in bash which I am using to parse all the data to a .csv file but I am unable to think of a way so that if I specify a date range as a parameter the parser would only look at the files between those dates for eg: between 2017-11-02 and access_log.2017-11-20. I have written the log parser in bash and giving my code down below. Any help would be appreciated.

PS. I am very new to bash so I apologize for the mess in the bash file. Also I am using a bash port for windows.

#!/bin/bash
FILES=C:/Users/userid/Desktop/UAT_log_files/*
for f in $FILES 
do
echo "Processing $f file.."
sed 's|[[[,]||g' $f >>$f.temp

LOG="$f.temp"
echo "Line Number,clientip,requestid,user,date,request,method,response,bytes,request_time,referrer,HTTP_Client & session_id" > $f.csv

< $LOG awk  '{if(length($13)>100) $13=substr($13,1,100);print NR-0 "," $1","$2","$3","$4","$6","$7" "$8" "$9","$10","$11" , " $12", " $13" ,  " $14"  " $15"  " $16" " $17" " $18" " $19" " $20" " $21" " $22" " $23" " $24" " $25" " $26" " $27" " $28}' >> $f.csv

rm $FILES.temp 
done
echo "clientip , requestid , user , date , request , method , response , bytes , request_time , referrer , HTTP_Client & session_id " > $FILESMainlog.csv
cat $FILES.csv >> Mainlog_temp.csv
rm .csv
echo "Deleting the temporary files now.."
rm $FILES.csv
echo "fixing the date time format"
awk -f redate.awk mainlog_temp.csv>mainlog.csv
sed 's/--date ::/date/g' mainlog.csv > new.csv ; mv new.csv mainlog.csv
rm mainlog_temp.csv
echo "Done! The file mainlog.csv has been created in the current directory"
Manish Jha
  • 33
  • 5

2 Answers2

1

Use bash range expansion of the form

{<START>..<END>}

For example

for file in "C:/Users/manishj/Desktop/UAT_log_files/access_log.2017-11-"{02..19}
# Above expands as access_log.2017-11-02,access_log.2017-11-03 and so.
do
  #required operation on "$file"
done

Edit

If range expansion doesn't work on the Windows port of bash, then use a c-style for-loop

for((i=2;i<=19;i++)) # for files 02 to 19
do
file="C:/Users/manishj/Desktop/UAT_log_files/access_log.2017-11-$(printf "%02d" $i)"
# Above, $file expands as access_log.2017-11-02,access_log.2017-11-03 and so.
# Do operation with "$file", Make sure you put it in double quotes.
done

Here we employ bash command substitution.

Edit 2

If c style for loops too are not allowed then go for a traditional while loop

no=2
while [ $no -le "19" ]
do
file="C:/Users/manishj/Desktop/UAT_log_files/access_log.2017-11-$(printf "%02d" $no)"
    # Above, $file expands as access_log.2017-11-02,access_log.2017-11-03 and so.
    # Do operation with "$file", Make sure you put it in double quotes.
no=$((no+1)) # incrementing no
done
sjsam
  • 21,411
  • 5
  • 55
  • 102
0

Get the date part of a string. (It is assumed that it's an access log.)

extractDate() { sed 's/[^\.]*\.//' <(echo $1); }

Convert date to timestamp.

toStamp() { date --date="$1" +%s; }

Check if date is between bound.

# [$1, $2): bound
# $3 current date

isBetween() {
    [[ `toStamp $1` -le `toStamp $3` ]]&&\
    [[ `toStamp $2` -gt `toStamp $3` ]]&&\
    true||\
    false
}

Test.

test() { `isBetween $1 $2 $3` && echo true || echo false; }

test 2010 2020 2009
test 2009 2020 2009
test 2020 2020 2009
test 2008 2020 2009
test 2021 2020 2009
ntj
  • 171
  • 12