2

I have a log file which contains execution start times and end times of various threads. I have done half the work, but I need some help to finish the other half. I wrote this command-

cat 2017-05-15.log | grep 'Executing ETL' | tr -s ' ' | cut -f2,3,4,5,8 -d' ' | sort -k5 -n

which produces the following output:

15 May 2017 03:43:40 696
15 May 2017 03:44:35 696
15 May 2017 03:45:02 696
15 May 2017 23:30:22 9502
15 May 2017 23:49:40 9502
15 May 2017 23:50:50 9502
15 May 2017 23:51:11 9502
15 May 2017 23:52:11 9502
15 May 2017 23:52:42 9502
15 May 2017 02:18:32 12795    
15 May 2017 02:19:35 12795
15 May 2017 02:20:02 12795
15 May 2017 02:33:39 13674
15 May 2017 02:35:13 13674
15 May 2017 02:35:42 13674
15 May 2017 18:52:28 19143
15 May 2017 18:53:01 19143
15 May 2017 18:53:35 19143
15 May 2017 18:53:59 19143
15 May 2017 18:54:40 19143

This output is sorted on the process Id, which is the last column. The first occurence of each process Id is that process' start time, while the last occurrence is the end time of the same. I need to display only the first (start time) and last (end time) of each process. Like this:

15 May 2017 03:43:40 696
15 May 2017 03:45:02 696
15 May 2017 23:30:22 9502
15 May 2017 23:52:42 9502
15 May 2017 02:18:32 12795
15 May 2017 02:20:02 12795
15 May 2017 02:33:39 13674
15 May 2017 02:35:42 13674
15 May 2017 18:52:28 19143
15 May 2017 18:54:40 19143

The number of entries for each process id is not fixed. The output need not strictly be in this format. But I need to be able to clearly see the start and end times of each process.

RodrikTheReader
  • 757
  • 1
  • 9
  • 22

2 Answers2

5

If the PID's are never mixed up, then this is rather simple. We just keep track of the last line and the PID on it, and print the last and the current one when a change is seen. (Skip printing if last is empty, otherwise we get an empty row to start with, and remember to print the very last line at the END.)

$ awk '($5 != lastpid)  { if (last) print last; print $0; }
       { lastpid = $5; last = $0  } 
       END {print last }' < times
15 May 2017 03:43:40 696
15 May 2017 03:45:02 696
15 May 2017 23:30:22 9502
15 May 2017 23:52:42 9502
15 May 2017 02:18:32 12795    
15 May 2017 02:20:02 12795
15 May 2017 02:33:39 13674
15 May 2017 02:35:42 13674
15 May 2017 18:52:28 19143
15 May 2017 18:54:40 19143
ilkkachu
  • 6,221
  • 16
  • 30
  • You are assuming here that the file **2017-05-15.log** contains the data format given in the question, which is not correct. That file actually contains a lot of spurious data, which I removed by "cat 2017-05-15.log | grep 'Executing ETL' | tr -s ' ' | cut -f2,3,4,5,8 -d' ' | sort -k5 -n". This minor issue aside, your answer works if I redirect my command's output to a file and use that file in your command. Thanks a lot! :) – RodrikTheReader May 16 '17 at 10:38
  • @RodrikTheReader, yes you're right, I seem to be a bit slow today. We could probably awk through the whole ordeal, without using grep and cut in between, but without seeing the original file format, that's a bit hard to do. (Having the input sorted makes the awk part easier, without sorting the awk script would need to keep track of all the PIDs at the same time.) You could also skip the temporary file and just put the `awk` in the pipeline after the `sort`. – ilkkachu May 16 '17 at 10:52
0

Another in awk. Hash all firsts and lasts and print in the end. If there is only one entry, only one will be outputed:

$ awk '
{
    if($5 in f)                              # if first exists
        l[$5]=$0                             # update last
    else f[$5]=$0 }                          # else first
END {
    for(i in f)                              # loop all firsts
        print f[i] ((i in l)?ORS l[i]:"") }  # output firsts and lasts if exist
' file
15 May 2017 03:43:40 696
15 May 2017 03:45:02 696
15 May 2017 23:30:22 9502
15 May 2017 23:52:42 9502
15 May 2017 02:18:32 12795    
15 May 2017 02:20:02 12795
15 May 2017 02:33:39 13674
15 May 2017 02:35:42 13674
15 May 2017 18:52:28 19143
15 May 2017 18:54:40 19143
James Brown
  • 36,089
  • 7
  • 43
  • 59