5

I have a file with records that are of the form:

SMS-MT-FSM-DEL-REP
country: IN
1280363645.979354_PFS_1_1887728354

SMS-MT-FSM-DEL-REP
country: IN
1280363645.729309_PFS_1_1084296392

SMS-MO-FSM
country: IR
1280105721.484103_PFM_1_1187616097

SMS-MO-FSM
country: MO
1280105721.461090_PFM_1_882824215

This lends itself to parsing via awk using something like: awk 'BEGIN { FS="\n"; RS="" } /country:.*MO/ {print $0}'

My question is how do I use awk to search the records on 2 separate fields? For example I only want to print out records that have a country of MO AND whos record first line is SMS-MO-FSM ?

adaptive
  • 263
  • 3
  • 7

2 Answers2

4

if you have set FS="\n", and RS="", then the first field $1 would be SMS-MO-FSM. Therefore your awk code is

awk 'BEGIN{FS="\n"; RS=""} $2~/country.*MO/ && $1~/SMS-MO-FSM/ ' file
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • Watch out with unwanted regex matches (like `country: SMO`). I would use string comparison whenever possible and anchor all regex. – schot Aug 09 '10 at 09:45
  • thank you but I wonder if you could answer (probably a very simple) addition to the last question. I would like to print out the result on one line (for piping into sort|uniq)> I ran your code and it worked great (thanks) but when I set OFS to " " (space) the fields of the record still came out on different lines. What am I doing wrong? Here is my code: awk 'BEGIN{FS="\n"; RS=""; OFS=" ";} $2~/country: MO$/ && $1~/SMS-MO-FSM/ {print $0}' testFile.txt – adaptive Aug 09 '10 at 10:03
  • when piping to sort, you need newlines.. i don't know how to answer your question since you don't provide enough information on your data. try setting OFS="\n" and see. – ghostdog74 Aug 09 '10 at 10:35
3

(I post this as a separate answer instead of a comment reply for better formatting)

Concerning your second remark about printing a record on a single line: When you don't modify your records OFS and ORS have no effect. Only when you change $0 or one of the fields awk will recompute NF and reconstruct $0 based on $1 OFS $2 OFS ... $NF ORS. You can force this reconstruction like this:

BEGIN {
    FS  = "\n"
    RS  = ""
    OFS = ";"     # Or another delimiter that does not appear in your data
    ORS = "\n"
}
$2 ~ /^[ \t]*country:[ \t]*MO[ \t]*$/ && $1 ~ /^[ \t]*SMS-MO-FSM[ \t]*$ {
    $1 = $1 ""    # This forces the reconstruction
    print
}
schot
  • 10,958
  • 2
  • 46
  • 71