2

I have two files and I need to sort and merge the rows based on the time column:

File A:

"2014-02-26 16:03:04"   "Login Success|isNoSession=false"   id=csr,ou=user,dc=openam,dc=forgerock,dc=org    7efb2f0e035a0e3d01  10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-100  DataStore   "Not Available" 10.17.174.30

File B:

"2014-02-26 16:02:27"   "Login Failed"  dennis  "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30    
"2014-02-26 16:02:37"   "Login Failed"  purva   "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30

I need to merge the files (pretty standard) but I have to insert the rows into final file based on time found in column 1. I have several other items to modify for each line but I'm pretty sure I can figure that out. The sorting based on time column has me stumped.

So in this case I would have a file with the line from File A at the end.

Other details.

Just to refresh myself on gawk I was working on parsing the first file. Here is what I have so far:

#!/bin/awk -f
BEGIN {
    FS="\t";
}
{
    # if we have more than 12 fields for the current row, proceed
    if ( NF > 12 )
    {
        # start looking for the user name
        n = split( $3, var1, ",");
        if (n > 4)
        {
            n2 = split (var1[1], var2, "=");
            if (n2 >= 2)
            {
                # Ignore any line where we do not have "id=xxxxx,..."
                if (var2[1] == "id")
                {
                    print $1, "N/A", "N/A", $12, $5, $5, var2[2]
                }
            }
        }
    }
}
END {
    print "Total Number of records=" NR
}

I probably need to move that into a function to make it easier since I'm going to be processing two files at the same time.

D-Klotz
  • 1,973
  • 1
  • 15
  • 37
  • 2
    If you just concatenated the two files and then sorted the concatenated file by the date/time field using the system `sort`, would that get you the result that you need? I note that the date/time format is such that it can be sorted alphabetically to have the dates and times in chronological order. – Simon Feb 26 '14 at 21:18
  • The sorting has to be based off of the actual time, not a character sort. There is a better way of saying that, I hope you get my meaning. – D-Klotz Feb 26 '14 at 21:20
  • 2
    @D-Klotz: our question is: how does a character sort _differ_ from an 'actual time' sort in this case? 'cause in that date/time format, they are one and the same IMO. – Wrikken Feb 26 '14 at 21:24
  • 1
    @D-Klotz: It seems to me that the way the date and time are formatted for this case, a character sort and a time-based sort would give exactly the same result. If there are any examples where they would give a different result, could you show those? – Simon Feb 26 '14 at 21:26
  • I think you are right – D-Klotz Feb 26 '14 at 21:30

2 Answers2

1

Based in the linux and bash tags, you can concatenate both files, sort them by first field and then apply your awk command to the result:

cat fileA fileB | sort -t$'\t' -s -k1,1 | awk -f script.awk
Birei
  • 35,723
  • 2
  • 77
  • 82
1

Little extra work but if you'd like to do it completely in awk (GNU awk), then you'll have to use mktime and strftime functions.

Here is a hint:

awk '{
    # Split the time field so that you have a pattern of YYYY MM DD HH MM SS
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];  
    # Store your variable in array
    time[mktime(d)]++
}
END {
    # Sort the array so that you get sorted time
    x = asorti(time, s_time)
    # Iterate over your new sorted array and print it in desired format
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file

$ cat file
2014-02-26 16:03:04
2017-02-26 16:02:27
2012-02-26 16:02:37

$ awk '{
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];   
    time[mktime(d)]++
}
END {
    x = asorti(time, s_time)
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file
2012-02-26 16:02:37
2014-02-26 16:03:04
2017-02-26 16:02:27
jaypal singh
  • 74,723
  • 23
  • 102
  • 147