I am processing several files at a time.Each of which has summary stats . At the end of the process I want to create a summary file that will add up all the stats . I already know how to dig out the stats from the log files. But I want to be able to add the numbers and echo to another file Here is what I use to dig out the times .
find . -iname "$srch1*" -exec grep "It took" {} \; -print
output would be like this
It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000010-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log
It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log
It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log
It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log
It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 40 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log
what I want is something like this
Summary
filepart000006-20140204-154923.dat.gz.log 0 hours, 11 minutes and 40 seconds
then find out the LONGEST times among them and output some message like .
Total time taken =____________
I am running in parallel so the time taken is the longest one.
then do some calculations like this.
find . -iname "$srch*" -exec grep "Processed Files" {} \; -print
Processed Files: 7936635
./filename-20131102-part000000-20140204-153310.dat.gz.log
Processed Files: 3264805
./filename-20131102-part000001-20140204-153310.dat.gz.log
Processed Files: 1607547
./filename-20131102-part000008-20140204-153310.dat.gz.log
Processed Files: 3180478
./filename-20131102-part000003-20140204-153310.dat.gz.log
Processed Files: 1595497
./filename-20131102-part000007-20140204-153310.dat.gz.log
Processed Files: 1568532
./filename-20131102-part000009-20140204-153310.dat.gz.log
Processed Files: 3259884
./filename-20131102-part000002-20140204-153310.dat.gz.log
Processed Files: 3141542
./filename-20131102-part000004-20140204-153310.dat.gz.log
Processed Files: 3124221
./filename-20131102-part000005-20140204-153310.dat.gz.log
Processed Files: 3136845
./filename-20131102-part000006-20140204-153310.dat.gz.log
and if I want just the metrics
( find . -iname "dl-aster-full-20131102*" -exec grep "Processed Files" {} \;) | cut -d":" -f2
7936635
3264805
1607547
3180478
1595497
1568532
3259884
3141542
3124221
3136845
Based on the above 2 just create a summary file .
Filename Processed files
filename-20131102-part000000-20140204-153310.dat.gz.log 7936635
.... then a summary which is all the above added.
( 7936635 +
3264805 +
1607547 +
3180478.....etc
1595497
1568532
3259884
3141542
3124221
3136845 ) as
Total Files = ____________
so overall like this one .
Filename Processed files
filename-20131102-part000000-20140204-153310.dat.gz.log 7936635
Total Files = ____________ ( sum of all above )
All that that needs to be done is -- Get the output in format
Filename Processed files
filename-20131102-part000000-20140204-153310.dat.gz.log 7936635
in my above command they are on different line and then perform summation for the numbers already outputted.
My Question is . -- How can I perform addition like above - using anything. I'd avoid PERL , since I am not sure , it'd be installed everywhere where the shell is run -- How can I format the output like above . I already know how to extract the output