-1

I am processing several files at a time.Each of which has summary stats . At the end of the process I want to create a summary file that will add up all the stats . I already know how to dig out the stats from the log files. But I want to be able to add the numbers and echo to another file Here is what I use to dig out the times .

find . -iname "$srch1*" -exec grep "It took" {} \; -print

output would be like this

    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000010-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log
It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log
It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log
It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log
It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 40 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log

what I want is something like this

Summary 
filepart000006-20140204-154923.dat.gz.log  0 hours, 11 minutes and 40 seconds

then find out the LONGEST times among them and output some message like .

 Total time taken =____________

I am running in parallel so the time taken is the longest one.

then do some calculations like this.

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print

        Processed Files:   7936635
./filename-20131102-part000000-20140204-153310.dat.gz.log
        Processed Files:   3264805
./filename-20131102-part000001-20140204-153310.dat.gz.log
        Processed Files:   1607547
./filename-20131102-part000008-20140204-153310.dat.gz.log
        Processed Files:   3180478
./filename-20131102-part000003-20140204-153310.dat.gz.log
        Processed Files:   1595497
./filename-20131102-part000007-20140204-153310.dat.gz.log
        Processed Files:   1568532
./filename-20131102-part000009-20140204-153310.dat.gz.log
        Processed Files:   3259884
./filename-20131102-part000002-20140204-153310.dat.gz.log
        Processed Files:   3141542
./filename-20131102-part000004-20140204-153310.dat.gz.log
        Processed Files:   3124221
./filename-20131102-part000005-20140204-153310.dat.gz.log
        Processed Files:   3136845
./filename-20131102-part000006-20140204-153310.dat.gz.log

and if I want just the metrics

( find . -iname "dl-aster-full-20131102*" -exec grep "Processed Files" {} \;) | cut -d":" -f2
   7936635
   3264805
   1607547
   3180478
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845

Based on the above 2 just create a summary file .

Filename                                                  Processed files 
filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

.... then a summary which is all the above added.

   ( 7936635 +
   3264805 +
   1607547 +
   3180478.....etc
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845 ) as 


 Total Files = ____________

so overall like this one .

Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635
     Total Files = ____________ ( sum of all above ) 

All that that needs to be done is -- Get the output in format

 Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

in my above command they are on different line and then perform summation for the numbers already outputted.

My Question is . -- How can I perform addition like above - using anything. I'd avoid PERL , since I am not sure , it'd be installed everywhere where the shell is run -- How can I format the output like above . I already know how to extract the output

user1874594
  • 2,277
  • 1
  • 25
  • 49

1 Answers1

2

with below sed command, you can get the output (filename and grep result into one line), then the next will be easy for you. (the grep result should be only one line for each file)

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/'

./filepart000010-20140204-154923.dat.gz.log    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 40 seconds to process that file.


find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/' 
./filename-20131102-part000000-20140204-153310.dat.gz.log         Processed Files:   7936635
./filename-20131102-part000001-20140204-153310.dat.gz.log         Processed Files:   3264805
./filename-20131102-part000008-20140204-153310.dat.gz.log         Processed Files:   1607547
./filename-20131102-part000003-20140204-153310.dat.gz.log         Processed Files:   3180478
./filename-20131102-part000007-20140204-153310.dat.gz.log         Processed Files:   1595497
./filename-20131102-part000009-20140204-153310.dat.gz.log         Processed Files:   1568532
./filename-20131102-part000002-20140204-153310.dat.gz.log         Processed Files:   3259884
./filename-20131102-part000004-20140204-153310.dat.gz.log         Processed Files:   3141542
./filename-20131102-part000005-20140204-153310.dat.gz.log         Processed Files:   3124221
./filename-20131102-part000006-20140204-153310.dat.gz.log         Processed Files:   3136845

If you need calculate the longest time and total time, use below script (you should be fine to format the output.)

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/' > temp1
awk 'function s2t(x) { h=int(x/3600);m=int((x-h*3600)/60);s=x-h*3600-m*60}
{a=$4*3600+$6*60+$9;max=a>max?a:max;t+=a}
END{ s2t(max);print "max is",h,m,s;
s2t(t);print "sum is " ,h,m,s}' temp1

max is 0 59 38
sum is  3 46 27

for second one:

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/'  > temp2
awk '{sum+=$NF}END{print "Total Files = ", sum}' temp2

Total Files =  31815986
BMW
  • 42,880
  • 12
  • 99
  • 116
  • Thx .How do I do a comparison of the times. e.g. 0 hours, 11 minutes and 40 seconds and find the longest time.TO get the total files- I'll just push those numbers into an array and add the contents ? or a better approach – user1874594 Feb 06 '14 at 04:22
  • updated, try do something by yourself first, with that you can learn quicker. – BMW Feb 06 '14 at 04:34
  • Thanks galore I will. – user1874594 Feb 06 '14 at 11:57