5

This is with gawk 4.0.0, running on Windows 7 with cygwin. The program is invoked like

gawk -f procjournal.gawk testdata

I have some data that looks like this:

"Date";"Type";"Amount";"Balance"
"6/11/2013 11:51:17 AM";"Transaction Tax";-427.5;399313884.46
"6/11/2013 11:51:17 AM";"Market Transaction";47500;399314311.96
"6/11/2013 11:12:42 AM";"Transaction Tax";-549.92;399266811.96
"6/11/2013 11:12:42 AM";"Market Transaction";61101.78;399267361.88

I want to extract the lines for transactions, strip the date part from the 1st field, and reformat the time stamp as a decimal. I thought I could do it with this awk program:

FS=";"
OFS=";"
/Market Transaction/ {
    split($1, itemdate, " ");
    tmp = itemdate[2];
    split(tmp, hms, ":");
    timestamp = hms[3] + (hms[2] * 60) + (hms[1] * 3600);
    if (itemdate[3] == "AM")
       timestamp += 12 * 3600;
    timestamp /= 3600.0;
    $1 = timestamp;
    print;
}

but my output looks like this:

"Date";"Type";"Amount";"Balance"
"Date";"Type";"Amount";"Balance"
"6/11/2013 11:51:17 AM";"Transaction Tax";-427.5;399313884.46
"6/11/2013 11:51:17 AM";"Transaction Tax";-427.5;399313884.46
"6/11/2013 11:51:17 AM";"Market Transaction";47500;399314311.96
"6/11/2013 11:51:17 AM";"Market Transaction";47500;399314311.96
11.8547;"Market Transaction";47500;399314311.96
"6/11/2013 11:12:42 AM";"Transaction Tax";-549.92;399266811.96
"6/11/2013 11:12:42 AM";"Transaction Tax";-549.92;399266811.96
"6/11/2013 11:12:42 AM";"Market Transaction";61101.78;399267361.88
"6/11/2013 11:12:42 AM";"Market Transaction";61101.78;399267361.88
11.2117;"Market Transaction";61101.78;399267361.88

Why are non-matching lines being printed, and how do I suppress that?

wades
  • 927
  • 9
  • 24
  • 1
    You have FS=";" outside of an action block and therefore it's a condition. As such it evaluates to true and invokes the default action of printing the current record. Ditto for OFS=";". See @Jaypal's answer for how to fix that, and it's also a good idea to use `FS=OFS=";"` for conciseness when you want both to have the same value. – Ed Morton Jun 12 '13 at 13:56

2 Answers2

6

Change the script to include the separators in BEGIN block.

Script Content:

BEGIN {
FS=";"
OFS=";"
}
/Market Transaction/ {
    split($1, itemdate, " ");
    tmp = itemdate[2];
    split(tmp, hms, ":");
    timestamp = hms[3] + (hms[2] * 60) + (hms[1] * 3600);
    if (itemdate[3] == "AM")
       timestamp += 12 * 3600;
    timestamp /= 3600.0;
    $1 = timestamp;
    print;
}
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • Don't think this is it. The program is stored in a file, invoked with `gawk -f`. When I tried your suggestion I got `backslash not last character on line`. – wades Jun 11 '13 at 20:54
  • 1
    The big problem is that the OP changes the input and output field separators for each line, and does very strange and bad things. Put them inside the `BEGIN` block and magically all will work, or I hope so. Those backslashes have non sense for me. It gives same error in Linux. – Birei Jun 11 '13 at 21:01
  • Aaah, Now that you updated the question it makes sense. You just need to include the `FS` and `OFS` in `BEGIN` block. Do that and you should get two lines of output. – jaypal singh Jun 11 '13 at 21:02
0

Not an answer to your question, but a different way to calculate the time:

if (match($1, /([0-9]?[0-9]):([0-9][0-9]):([0-9][0-9]) ([AP]M)/, a)) {
    $1 = a[1] + (a[2]*60 + a[3])/3600 + (a[4] == "PM" ? 12 : 0)
}
glenn jackman
  • 238,783
  • 38
  • 220
  • 352