2

I have been giving an input file where it looks as this:

ID          PID   PPID   C STIME  TTY            TIME  CMD
adz110     5344   5334   0 08:47  pts/2     00:00:00  bash
dmq292     6908   6854   0 08:53  pts/1     00:00:00  bash
adz110     7227   7150   0 08:54  pts/9     00:00:00  who
erg474     7466   7461   0 08:54  pts/10    00:00:00  ls
dmq292     7966   7960   0 08:55  pts/13    00:00:00  assign1.sh if of
xle135     8636   8628   0 08:58  pts/15    00:00:00  bash
xle135     8983   8636   0 08:59  pts/15    00:00:00  ssh ctf.cs.utsarr.net
zeh458     9057   1980   0 08:59  pts/7     00:00:00  vim prog.c
HanSolo    9150   9139   0 08:59  pts/16    00:00:00  ps -af

and it needs to end like this:

User: adz110
    bash
    who
User: dmq292
    bash
    assign1.sh if of
User: erg474
    ls
User: xle135
    bash
    ssh ctf.cs.utsarr.net
User: zeh458
    vim prog.c
Earliest  Start  Time:
adz110    5344   5334 0 08:47  pts/2     00:00:00  bash

Latest  Start  Time
xle135    8983   8636   0 08:59  pts /15    00:00:00  ssh ctf.cs.utsarr.net

What I have come up until now is within a file.awk that I have written down :

/[a-z]{3}[0-9]{3}/    
{   
    print $1
    if( match($1 , arg1) )
    {
    for(i=8; i <= NF ; i ++) 

       printf("%s", $i); 

    printf("\n"); 

    }

}
END {
        print " Earliest Start Time:" 

        print "Latest Start Time: "
 }

So instead of printing for each user having the username matching the [a-z][0-9]{3}, it prints the cmd along with the username without any spaces. I am having a lot of trouble understanding associative arrays and the use of specials variables like NR, RS, etc.

  • 3
    Is really a space after `pts`? – Cyrus Mar 05 '19 at 05:19
  • There are no associative arrays here. It is printing precisely what you are asking it to print. – tripleee Mar 05 '19 at 05:24
  • So I was able to knock down the logical part if the user it's a duplicate or not by doing " if ( a[$1]++) " so would that count as an associative array? Because I also need to take into account the line with the earliest start time and with the latest start time right? @Cyrus I Fixed those extra spaces that were added incorrectly – Malachy O'Sullivan Mar 05 '19 at 05:32
  • Why there're two spaces in `vim prog.c`? – Til Mar 05 '19 at 05:44
  • Why isn't HanSolo listed in the required output? Is it because the user name doesn't match 3 letters, 3 digits? – Jonathan Leffler Mar 05 '19 at 05:46
  • @Jonathan Exactly so I am trying to get rid of it by using a regex but it isn't responding well – Malachy O'Sullivan Mar 05 '19 at 05:48
  • Where is the sorting occurring? If all goes well, the title line vanishes because it doesn't match the three letter, three digit criterion (as does HanSolo). You're also not tracking start times and end times. What logic do you plan to use there? – Jonathan Leffler Mar 05 '19 at 05:49
  • @JonathanLeffler sed | awk -f file.awk fileToTest.txt – Malachy O'Sullivan Mar 05 '19 at 05:50
  • With `sed '…' | awk -f file.awk fileToTest,txt`, the output from `sed` is ignored by `awk` because `awk` only reads `fileToTest.txt` for the data (and `file.awk` for the script, of course). So, please check your comment and either edit it or delete it and add the corrected material to the question (or create a new comment with correct material). – Jonathan Leffler Mar 05 '19 at 05:53
  • 1
    If user `zbc123` is running 3 instances of `bash` should `bash` appear 3 times in the output for that user or just once? – Ed Morton Mar 05 '19 at 16:06

4 Answers4

1

You want to apply the regex only against the first field, and collect the values for each user in memory in a format suitable for printing out.

Observe that the listing is fixed-width; so the program name and arguments are in a field which starts in column 55. Similarly, the time stamp is in columns 28-32.

awk 'NR > 1 && $1 ~ /^[a-z]{3}[0-9]{3}$/ {
  when = substr($0, 28, 5)
  command = substr($0, 55)  
  if ($1 in user) {
      # Append this command to previous value
      user[$1] = user[$1] ORS "    " command
  } else {
      # Create a new item in this associative array
      user[$1] = "    " command
  }
  if (NR==2 || when > max) { max=when; maxcmd = $0 }
  if (NR==2 || when < min) { min=when; mincmd = $0 }
}
END {
    # XXX TODO: sort array?
    for (u in user) printf "User %s:\n%s\n", u, user[u]
    print "Earliest start time"; print mincmd
    print "Lastest start time"; print maxcmd
}' filename

So the first time we see a particular user, they will not be in users, and so we just put their first command indented by four spaces as the value for that key in the associative array. If we see them again, we append another line to the value, with a newline (ORS) and four spaces in front.

NR is the current line number -- we skip NR==1 to avoid capturing the header line, and with NR==2 we know that this is the first line we are processing, so we set max and min to their baseline values.

Ideally, you should also normalize the timestamps into a canonical form so that you can sort "yesterday 21:24" before "21:23", but I'm not going there with this simple answer. You probably want to add embellishments in various other places, too. (Or maybe you could rely on the fact that the input seems to be sorted by process start time?)

If you want the user names to be sorted, GNU Awk has array sorting built-in; for other Awks, you will need to write your own simple sort function, or use an external pipeline.

tripleee
  • 175,061
  • 34
  • 275
  • 318
1

Try this, file.awk:

$1 !~ /[a-z]{3}[0-9]{3}/ {next;}
!fstTime {fstTime=$5; lstTime=$5; first=$0; last = $0;}
!($1 in a) {a[$1];users[++ind]=$1;}
{   cmd=$8; for(i=9;i<=NF;i++) cmd=cmd OFS $i;
    cmds[$1] = cmds[$1] ? cmds[$1] "\n    " cmd : "    " cmd;
    if ($5 < fstTime) { fstTime=$5; first=$0; }
    if ($5 > lstTime) { lstTime=$5; last = $0; }
}
END { 
    for(i=1;i<=length(a);i++) {
        print "User: " users[i];
        print cmds[users[i]];
    }
    print "Earliest  Start  Time:\n" first "\n\nLatest  Start  Time:\n" last; 
}

awk -f file.awk fileToTest.txt yields:

User: adz110
    bash
    who
User: dmq292
    bash
    assign1.sh if of
User: erg474
    ls
User: xle135
    bash
    ssh ctf.cs.utsarr.net
User: zeh458
    vim prog.c
Earliest  Start  Time:
adz110     5344   5334   0 08:47  pts/2     00:00:00  bash

Latest  Start  Time:
xle135     8983   8636   0 08:59  pts/15    00:00:00  ssh ctf.cs.utsarr.net

If you want the zeh458 9057 1980 0 08:59 pts/7 00:00:00 vim prog.c be the Latest Start Time: line, just change ($5 > lstTime) to ($5 >= lstTime) will do.

Til
  • 5,150
  • 13
  • 26
  • 34
0

There is one confusion in your input file (your command's output displays a few processes with the same time, e.g. xle135 and zeh458). So the solution I have come up with will sort your input file (you could put your command if you need in place of it too), then it will figure out maximum and minimum time period out of all entries. If some things have the same time stamp, it will not concatenate their values (though we could do that, too); instead, it will print the latest sorted one (with the latest timestamp only).

awk '
/^ID/{
  next
}
{
  split($5,array,":")
  seconds=array[1]*60+array[2]
}
FNR==NR{
  a[$1]++
  tim[seconds]=$0
  next
}
a[$1]==2{
  print "User: " $1  ORS "    " $NF
  getline
  sub(/.*:00/,"")
  print"  " $0
}
a[$1]==1{
  val=$0
  sub(/.*:00/,"",val)
  print "User: " $1 ORS "    " val
}
{
  min=min<seconds?(min?min:seconds):seconds
  max=max>seconds?max:seconds
}
END{
  print "Earliest  Start  Time:" ORS tim[min] ORS "Latest  Start  Time" ORS tim[max]
}
' <(sort -k1,5 Input_file)  <(sort -k1,5 Input_file)

Output will be as follows.

User: adz110
    bash
    who
User: dmq292
    bash
    assign1.sh if of
User: erg474
      ls
User: HanSolo
      ps -af
User: xle135
    bash
    ssh ctf.cs.utsarr.net
User: zeh458
      vim  prog.c
Earliest  Start  Time:
adz110     5344   5334   0 08:47  pts/2     00:00:00  bash
Latest  Start  Time
zeh458     9057   1980   0 08:59  pts/7     00:00:00  vim  prog.c
tripleee
  • 175,061
  • 34
  • 275
  • 318
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Why do you need to read the sorted input twice? Wouldn't it be easier to sort on the fifth field and then extract the other stuff in one go? – tripleee Mar 05 '19 at 06:40
  • @tripleee, It is because to handle case where a 1st field is having only 1 line(if you see given example have always more than 1 lines in it) so in that case we need not to jump on next line etc to handle that I had done it. If OP claifies it more, could try to do in single read also. – RavinderSingh13 Mar 05 '19 at 08:22
0

Assuming that if a given user is running the same command multiple times you want that command to appear multiple times in the output for that user:

$ cat tst.awk
NR == 1 { next }
/^[a-z]{3}[0-9]{3}/ {
    user = $1
    users[user]

    cmd = $0
    sub(/([^[:space:]]+[[:space:]]+){7}/,"",cmd)
    cmds[user,++numCmds[user]] = cmd

    stime = $5
    if ( (earliestStime == "") || (stime < earliestStime) ) {
        earliestStime = stime
        earliestData  = $0
    }
    if ( (latestStime == "") || (stime > latestStime) ) {
        latestStime = stime
        latestData  = $0
    }
}
END {
    for (user in users) {
        printf "User: %s\n", user
        for (cmdNr=1; cmdNr<=numCmds[user]; cmdNr++) {
            printf "   %s\n", cmds[user,cmdNr]
        }
    }

    print "\nEarliest  Start  Time:"
    print earliestData

    print "\nLatest  Start  Time:"
    print latestData
}

.

$ awk -f tst.awk file
User: xle135
   bash
   ssh ctf.cs.utsarr.net
User: zeh458
   vim prog.c
User: dmq292
   bash
   assign1.sh if of
User: erg474
   ls
User: adz110
   bash
   who

Earliest  Start  Time:
adz110     5344   5334   0 08:47  pts/2     00:00:00  bash

Latest  Start  Time:
xle135     8983   8636   0 08:59  pts/15    00:00:00  ssh ctf.cs.utsarr.net

If the order of users or commands in the output matters then tell us what the order should be and it'll be easy enough to adapt, e.g. by retaining the input order or setting PROCINFO["sorted_in"] with GNU awk before each loop in the END section.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185