3

I have a file like this, and I want to search for pattern matching "/4126/" and print only the month and year and the amount (the amount is not always in Jan 2014 as in example below).

awk -F! '/4126/ {print $0}'  prints the entire line

But I need to get it printed only the month/year and amount as follows :

Jan 2014
25492.00

A sample from the file is given here.

    +=====================================================================+
    ! Code  !  Jan 2014 !  Feb 2014 !  Mar 2014!    Arrears!  T o t a l s !
    +=====================================================================+
    ! 1101  !  26290.00 !  26290.00 !  26290.00!      0.00 !  3,15,480.00 !
    ! 1102  !    480.00 !    480.00 !    480.00!      0.00 !     5,760.00 !
    ! 2104  !  24213.09 !  25198.97 !  25198.97!      0.00 !  2,73,205.69 !
    ! 2107  !      0.00 !      0.00 !      0.00!      0.00 !    14,991.20 !
    ! 2113  !    275.00 !    275.00 !    275.00!      0.00 !     3,300.00 !
    ! 4114  !      0.00 !      0.00 !   1106.00!      0.00 !     4,424.00*!
    ! 4123  !   4667.00 !      0.00 !      0.00!      0.00 !     4,667.00 !
    ! 4126  !  25492.00 !      0.00 !      0.00!      0.00 !    25,492.00*!

Please provide me awk formula to do this. Thanks in advance.

adam1969in
  • 117
  • 2
  • 9
  • I need to match the code 4126 and then print only the amount paid and the month/year of payment. Jan 2014 and 25492.00 are only a example. And it is not always in January 2014. @karakfa has provided the right answer, but maybe someone else has a more brief one. – adam1969in Dec 26 '15 at 02:35
  • You'll never know since you selected an answer instead of editing your question to clarify it so now very few people will read your question and those who do will see it's unclear and so not bother trying to come up with a different answer. – Ed Morton Dec 26 '15 at 14:27

3 Answers3

3

You're almost there, $0 is the whole line, you need a specific field (and header)

$ awk -F! 'NR==2{h=$3} $2~/\y4126\y/{print h; print $3}' file

Jan 2014 
25492.00 

your sample output prints the previous value, if it's not a typo you should keep the previous line and print after a match.

To eliminate false matches, keep the pattern to the corresponding field and with word boundaries.

To print all nonzero amounts you can do the following

$ awk -F! 'NR==2{h[3]=$3; h[4]=$4; h[5]=$5}
   $2~/\y2104\y/{for(i=3;i<=5;i++) 
                    if($i!=0) 
                       {header=header OFS h[i]; 
                        line=line OFS $i
                       } 
                print header;
                print line}' file 


   Jan 2014    Feb 2014    Mar 2014
   24213.09    25198.97    25198.97
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Yes you are right, there is a typo. 25492.00 is correct. But you have given print $3 directly. I have already said that it is not always the same case. It differs in month/year for every employee and not always in Jan 2014 in the example provided. And the file is too big. – adam1969in Dec 25 '15 at 17:40
  • Your edited answer gives me what I want @karakfa. Just one request sir, as I have upto 22 headers, I can't go on defining as h[3]=$3, h[4]=$4....so on. Is there any shortcut, maybe using a for loop to store the headers. – adam1969in Dec 25 '15 at 18:14
  • You can write a loop through the header fields as in `for(i=3;i<=24;i++) h[i]=$i` or use `split($0,h,FS)` – karakfa Dec 25 '15 at 18:39
  • After using the for loop and a little change as ORS instead of OFS and removing the flower brackets between `header=header OFS.....line=line OFS $i ` , I am able to get it right. – adam1969in Dec 26 '15 at 08:36
  • You should mention it's gawk-specific due to `\y`. – Ed Morton Dec 26 '15 at 14:29
1

It's VERY unclear if you're asking to print the value from a certain column or the value for the column named "Jan 2014" and or a value across all columns and the header line from the column you find it in or something else but MAYBE this is what you want:

$ awk -F' *! *' -v tgt=4123 -v col=3 'NR==2{hdr=$col} $2==tgt{print hdr ORS $col}' file
Jan 2014
4667.00

$ awk -F' *! *' -v tgt=2104 -v col=4 'NR==2{hdr=$col} $2==tgt{print hdr ORS $col}' file
Feb 2014
25198.97

Given your new requirements:

$ cat tst.awk
BEGIN { FS=" *! *"; OFS="\t" }
NR==2 { split($0,hdrs) }
$2==tgt {
    for (i=3;i<(NF-1);i++) {
        if ($i != 0) {
            hdr = (hdr ? hdr OFS : "") hdrs[i]
            txt = (txt ? txt OFS : "") $i
        }
    }
}
txt { print hdr ORS txt }

$ awk -v tgt=4126 -f tst.awk file
Jan 2014
25492.00

$ awk -v tgt=2104 -f tst.awk file
Jan 2014        Feb 2014        Mar 2014
24213.09        25198.97        25198.97

The above will work in any awk and will only produce output when the target value is found (i.e. will not print blank lines or anything else if the target value is not found).

Actually - after reading your comment under @karakfa's answer, this may be what you want:

$ cat tst.awk
BEGIN { FS=" *! *"; OFS="\t" }
NR==2 { split($0,hdrs) }
$2==tgt {
    for (i=3;i<(NF-1);i++) {
        if ($i!=0) {
            print hdrs[i] ORS $i
        }
    }
}

$ awk -v tgt=2104 -f tst.awk file
Jan 2014
24213.09
Feb 2014
25198.97
Mar 2014
25198.97

You could have saved us guessing if you provided an example that produces output from multiple columns.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0
awk '$4~/Jan/{print $4, $5};$4~/4667.00/{print $4}' file
 Jan 2014
 4667.00

Since I don't define any field separator awk uses its built-in space. So if column $4 match Jan print field 4 and 5. The same again if column 4 match 4667 print field 4.

Claes Wikner
  • 1,457
  • 1
  • 9
  • 8