3

I'm fairly new to linux/bash shell and I'm really having trouble printing two values (the highest and lowest) from a particular column in a text file. The file is formatted like this:

Geoff        Audi           2:22:35.227
Bob          Mercedes       1:24:22.338
Derek        Jaguar         1:19:77.693
Dave         Ferrari        1:08:22.921

As you can see the final column is a timing, I'm trying to use awk to print out the highest and lowest timing in the column. I'm really stumped, I've tried:

awk '{print sort -n <  $NF}' timings.txt 

However that didn't even seem to sort anything, I just received an output of:

1
0
1
0
...

Repeating over and over, it went on for longer but I didn't want a massive line of it when you get the point after the first couple iterations.

My desired output would be:

Min: 1:08:22.921
Max: 2:22:35.227   
  • Have you tried something like this? They split the last column into 3 columns (hours, minutes, seconds) and use them for sorting https://stackoverflow.com/questions/22053402/awk-is-it-possible-to-read-a-time-field-and-use-it-for-sorting – Dr Fabio Gori Oct 11 '17 at 16:23
  • Just to add, I realise that the output I tried was to sort them in order, that was just me testing out if that would work in determining highest and lowest - not whether it would print out the highest and lowest, it was kind of just a starting point. The ultimate aim is to print out the highest timing and the lowest timing. –  Oct 11 '17 at 16:25
  • Ill have a go at what's described in that link you sent and update results. –  Oct 11 '17 at 16:26
  • Do you REALLY have a header line in your input file? [edit] your question to provide a few lines of input (we can't usefully sort 1 line!) plus the desired output. Make sure to include in the example all of the edge cases that you think will be hard for a script to handle. – Ed Morton Oct 11 '17 at 16:31
  • Is your column 3 containing a duration or a time? – Marc Lambrichs Oct 11 '17 at 16:35
  • The time it took for someone to complete a race. No sorry they don't have headers it was just how I formatted the question. –  Oct 11 '17 at 16:39
  • Why would you expect that to work? – 123 Oct 11 '17 at 16:40
  • Because I'm basically completely new to bash and awk. –  Oct 11 '17 at 16:42
  • @EdMorton, I've updated the table for some clarity. Sorry it was so basic to start with. –  Oct 11 '17 at 16:55
  • @EdMorton, Yes they're always a single digit. –  Oct 11 '17 at 17:07

3 Answers3

2

After question clarifications: if the time field always has a same number of digits in the same place, e.g. h:mm:ss.ss, the solution can be drastically simplified. Namely, we don't need to convert time to seconds to compare it anymore, we can do a simple string/lexicographical comparison:

$ awk 'NR==1 {m=M=$3} {$3<m&&m=$3; $3>M&&M=$3} END {printf("min: %s\nmax: %s",m,M)}' file
min: 1:08:22.921
max: 2:22:35.227

The logic is the same as in the (previous) script below, just using a simpler string-only based comparison for ordering values (determining min/max). We can do that since we know all timings will conform to the same format, and if a < b (for example "1:22:33" < "1:23:00") we know a is "smaller" than b. (If values are not consistently formatted, then by using the lexicographical comparison alone, we can't order them, e.g. "12:00:00" < "3:00:00".)

So, on first value read (first record, NR==1), we set the initial min/max value to the timing read (in the 3rd field). For each record we test if the current value is smaller than the current min, and if it is, we set the new min. Similarly for the max. We use short circuiting instead if to make expressions shorter ($3<m && m=$3 is equivalent to if ($3<m) m=$3). In the END we simply print the result.


Here's a general awk solution that accepts time strings with variable number of digits for hours/minutes/seconds per record:

$ awk '{split($3,t,":"); s=t[3]+60*(t[2]+60*t[1]); if (s<min||NR==1) {min=s;min_t=$3}; if (s>max||NR==1) {max=s;max_t=$3}} END{print "min:",min_t; print "max:",max_t}' file
min: 1:22:35.227
max: 10:22:35.228

Or, in a more readable form:

#!/usr/bin/awk -f
{
    split($3, t, ":")
    s = t[3] + 60 * (t[2] + 60 * t[1])
    if (s < min || NR == 1) {
        min = s
        min_t = $3
    }
    if (s > max || NR == 1) {
        max = s
        max_t = $3
    }
}

END {
    print "min:", min_t
    print "max:", max_t
}

For each line, we convert the time components (hours, minutes, seconds) from the third field to seconds which we can later simply compare as numbers. As we iterate, we track the current min val and max val, printing them in the END. Initial values for min and max are taken from the first line (NR==1).

randomir
  • 17,989
  • 1
  • 40
  • 55
  • 1
    You don't have to do any of the time math, just treat the the whole time as a single number – 123 Oct 11 '17 at 16:44
  • Thanks, this answered my question, but more importantly it's helping me understand what's going on at every step. I'm completely new to this kind of stuff so it helps having an explanation along with the solution. –  Oct 11 '17 at 16:44
  • @123, didn't know that. Thanks, I'll update my answer. – randomir Oct 11 '17 at 16:49
  • @EdMorton, you're right, that's better approach, will fix. Thanks! – randomir Oct 11 '17 at 16:50
  • @123, turns out you can't do it like that. If you convert the whole time field to number, you'll get just the leading number (hours). If you compare values as strings, `21:..` will come before `3:..`. String compare works only in the special case all values conform to the single format (1-digit hours). – randomir Oct 11 '17 at 17:40
  • @randomir, Also for future reference, am I right in thinking if I wanted to do this on a different table, to alter which column it works on I simply change all the $3 to $X where X is the column I wish to perform it on? –  Oct 11 '17 at 18:27
  • @SeanC, that's right, `$n` is the nth field. Just change `$3` to `$n`. – randomir Oct 11 '17 at 18:39
  • @randomir, Thanks for the great answer! I was wondering in the after question clarification solution could you explain the steps as you did with the first solution as I'm slightly confused as to how it works. Again I'm completely new to awk so sorry if I'm asking silly questions! –  Oct 13 '17 at 11:25
  • @SeanC, you're welcome! I've added an explanation for the first program, hope that helps. Happy `awk`-ing! :) – randomir Oct 13 '17 at 12:19
1

Given your statements that the time field is actually a duration and the hours component is always a single digit, this is all you need:

$ awk 'NR==1{min=max=$3} {min=(min<$3?min:$3); max=(max>$3?max:$3)} END{print "Min:", min ORS "Max:", max}' file
Min: 1:08:22.921
Max: 2:22:35.227
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

You don't want to run sort inside of awk (even with the proper syntax).

Try this:

sed 1d timings.txt | sort -k3,3n | sed -n '1p; $p'

where

  • the first sed will remove the header
  • sort on the 3rd column numerically
  • the second sed will print the first and last line
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Hi, thanks for the reply! I've used this and understand each step, however I'm only getting one output - the output I am getting however is indeed the largest time in the list but the smallest doesn't seem to be showing? A possible reason although I'm not exactly advanced with this stuff is that the largest time is the only one that has a 2 before the colon, all the others begin with a 1 - could it be that it is printing the largest but not the others because it is only sorting by the first number before the colon. Doesn't make sense since it should still print the first line however. –  Oct 11 '17 at 16:35