AWK is it possible to read a time field and use it for sorting?

Question

I have two files and I need to sort and merge the rows based on the time column:

File A:

"2014-02-26 16:03:04"   "Login Success|isNoSession=false"   id=csr,ou=user,dc=openam,dc=forgerock,dc=org    7efb2f0e035a0e3d01  10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-100  DataStore   "Not Available" 10.17.174.30

File B:

"2014-02-26 16:02:27"   "Login Failed"  dennis  "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30    
"2014-02-26 16:02:37"   "Login Failed"  purva   "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30

I need to merge the files (pretty standard) but I have to insert the rows into final file based on time found in column 1. I have several other items to modify for each line but I'm pretty sure I can figure that out. The sorting based on time column has me stumped.

So in this case I would have a file with the line from File A at the end.

Other details.

Just to refresh myself on gawk I was working on parsing the first file. Here is what I have so far:

#!/bin/awk -f
BEGIN {
    FS="\t";
}
{
    # if we have more than 12 fields for the current row, proceed
    if ( NF > 12 )
    {
        # start looking for the user name
        n = split( $3, var1, ",");
        if (n > 4)
        {
            n2 = split (var1[1], var2, "=");
            if (n2 >= 2)
            {
                # Ignore any line where we do not have "id=xxxxx,..."
                if (var2[1] == "id")
                {
                    print $1, "N/A", "N/A", $12, $5, $5, var2[2]
                }
            }
        }
    }
}
END {
    print "Total Number of records=" NR
}

I probably need to move that into a function to make it easier since I'm going to be processing two files at the same time.

If you just concatenated the two files and then sorted the concatenated file by the date/time field using the system `sort`, would that get you the result that you need? I note that the date/time format is such that it can be sorted alphabetically to have the dates and times in chronological order. — Simon, Feb 26 '14 at 21:18
The sorting has to be based off of the actual time, not a character sort. There is a better way of saying that, I hope you get my meaning. — D-Klotz, Feb 26 '14 at 21:20
@D-Klotz: our question is: how does a character sort _differ_ from an 'actual time' sort in this case? 'cause in that date/time format, they are one and the same IMO. — Wrikken, Feb 26 '14 at 21:24
@D-Klotz: It seems to me that the way the date and time are formatted for this case, a character sort and a time-based sort would give exactly the same result. If there are any examples where they would give a different result, could you show those? — Simon, Feb 26 '14 at 21:26

Birei · Accepted Answer · 2014-02-26T21:44:09.017

1

Based in the linux and bash tags, you can concatenate both files, sort them by first field and then apply your awk command to the result:

cat fileA fileB | sort -t$'\t' -s -k1,1 | awk -f script.awk

edited Feb 26 '14 at 21:44

answered Feb 26 '14 at 21:26

Birei

35,723
2
77
82

shazam ! This looks promising. I'm a bit of an old linux nerd but I never used sort. Can you explain the arguments you have? Thanks – D-Klotz Feb 26 '14 at 21:31
I'll look it up. I'll stop being lazy – D-Klotz Feb 26 '14 at 21:33
I couldn't find what the last "-" means within the sort portion. – D-Klotz Feb 26 '14 at 21:38
It's meaning was to read input from pipe but it's not needed for the `sort` command, so you can get rid of it (I've edited to remove it). – Birei Feb 26 '14 at 21:41

score 1 · Answer 2 · answered Feb 26 '14 at 21:49

Little extra work but if you'd like to do it completely in awk (GNU awk), then you'll have to use mktime and strftime functions.

Here is a hint:

awk '{
    # Split the time field so that you have a pattern of YYYY MM DD HH MM SS
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];  
    # Store your variable in array
    time[mktime(d)]++
}
END {
    # Sort the array so that you get sorted time
    x = asorti(time, s_time)
    # Iterate over your new sorted array and print it in desired format
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file

$ cat file
2014-02-26 16:03:04
2017-02-26 16:02:27
2012-02-26 16:02:37

$ awk '{
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];   
    time[mktime(d)]++
}
END {
    x = asorti(time, s_time)
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file
2012-02-26 16:02:37
2014-02-26 16:03:04
2017-02-26 16:02:27

@D-Klotz It will be a good learning experience. If you have any questions feel free to reach out. Good luck! — jaypal singh, Feb 26 '14 at 22:43

AWK is it possible to read a time field and use it for sorting?

2 Answers2

Linked