1

hello i am trying to compare file1 with file2,

  1. it compares from file1 column $1,$2 and from file2 $1,$3
  2. if there is match it will output those from file1 and file2 to output file where during comparison is copied column $1,$2,$7,$9 from file1 and from file2 during comparison is copied column $1,$3,$6,$7,$8 from file2 to resulting output file
  3. if there is not match it will add from file2 remaining non matched columns to output file as well
  4. at the end it will add in resulted incremented value as in output file as is in column $5

i partialy moved with it with this one

awk 'NR==FNR {a[$1,$3]=$0; next}
             {if(($1,$3) in a)
             {print a[$1,$3],$0; delete a[$1,$2]}
             else print $0}
     END    {for(k in a) print a[k]}' file1 file2

file1

SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7f FEX 98a7df9asd7f_a     
SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7g FEX 98a7df9asd7f_b     
SITE-A SERV-A AA 1.00 PPA IP 98a7df9asd7h FEX 98a7df9asd7f_c     
SITE-B SERV-A BB 1.00 DF IP a7sdf9899hhh FEX a7sdf9899hhh_a     
SITE-B SERV-A BB 1.00 DF IP a7sdf9899hhf FEX a7sdf9899hhh_b     
SITE-B SERV-A BB 1.00 AF IP a7sdf9899hhm FEX a7sdf9899hhh_c 

file2

SITE-A 17 SERV-A 0 39 idx a7sdf9899778 0 0 out_fan pri
SITE-A 17 SERV-A 1 1 test a7sdf9899779 1 0 out_fan pri
SITE-A 17 SERV-A 2 32 dummy_host a7sdf9899770 2 0 out_fan pri
SITE-C 22 SERV-A 2 519 dummy_host a7sdf9899772 2 2 out_fan pri  
SITE-C 22 SERV-A 3 520 prod a7sdf9899775 3 out_fan pri  
SITE-C 22 SERV-A 4 521 dev a7sdf9899774 4 out_fan pri 

desired output:

SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A 98a7df9asd7f_a 98a7df9asd7f 3
SITE-A SERV-A 98a7df9asd7f_b 98a7df9asd7g 4
SITE-A SERV-A 98a7df9asd7f_c 98a7df9asd7h 5
SITE-B SERV-A a7sdf9899hhh_a a7sdf9899hhh 0
SITE-B SERV-A a7sdf9899hhh_b a7sdf9899hhf 1
SITE-B SERV-A a7sdf9899hhh_c a7sdf9899hhm 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4 
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
da4fdsa6
  • 85
  • 7

1 Answers1

1
$ cat tst.awk
NR==FNR {
    key = $1 FS $3
    a[key] = a[key] key OFS $6 OFS $7 OFS $8 ORS
    cnt[key]++      # or cnt[key] = $8 + 1
    next
}
{
    key = $1 FS $2
    if ( key != prev ) {
        printf "%s", a[key]
        delete a[key]
        prev = key
    }
    print key, $6, $7, $8, cnt[key]++
}
END {
    for ( key in a ) {
        printf "%s", a[key]
    }
}

.

$ awk -f tst.awk file2 file1
SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A IP 98a7df9asd7f FEX 3
SITE-A SERV-A IP 98a7df9asd7g FEX 4
SITE-A SERV-A IP 98a7df9asd7h FEX 5
SITE-B SERV-A IP a7sdf9899hhh FEX 0
SITE-B SERV-A IP a7sdf9899hhf FEX 1
SITE-B SERV-A IP a7sdf9899hhm FEX 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4

It's not clear if you want the 5th output field for the file1 lines to start from a count of lines from file2 for the given key or the value of $8 from file2 so I included both options, one as a comment.

The for ( key in a ) in the END will print the remaining blocks of lines from file2 in a "random" order (see https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal), if that's a problem you just need to keep a separate array of keys with an incrementing index while file2 is being read (e.g. if (!(key in a)) keys[++numKeys]=key at the start) and use that in the END section to get the keys in that order (for (keynr=1; keyNr<=numKeys; keyNr++) { key=keys[keyNr] ...), i.e.:

$ cat tst.awk
NR==FNR {
    key = $1 FS $3
    if ( !(key in a) ) {
        keys[++numKeys] = key
    }
    a[key] = a[key] key OFS $6 OFS $7 OFS $8 ORS
    cnt[key]++
    next
}
{
    key = $1 FS $2
    if ( key != prev ) {
        printf "%s", a[key]
        delete a[key]
        prev = key
    }
    print key, $6, $7, $8, cnt[key]++
}
END {
    for ( keyNr=1; keyNr<=numKeys; keyNr++ ) {
        key = keys[keyNr]
        printf "%s", a[key]
    }
}

.

$ awk -f tst.awk file2 file1
SITE-A SERV-A idx a7sdf9899778 0
SITE-A SERV-A test a7sdf9899779 1
SITE-A SERV-A dummy_host a7sdf9899770 2
SITE-A SERV-A IP 98a7df9asd7f FEX 3
SITE-A SERV-A IP 98a7df9asd7g FEX 4
SITE-A SERV-A IP 98a7df9asd7h FEX 5
SITE-B SERV-A IP a7sdf9899hhh FEX 0
SITE-B SERV-A IP a7sdf9899hhf FEX 1
SITE-B SERV-A IP a7sdf9899hhm FEX 2
SITE-C SERV-A dummy_host a7sdf9899772 2
SITE-C SERV-A prod a7sdf9899775 3
SITE-C SERV-A dev a7sdf9899774 4
Ed Morton
  • 188,023
  • 17
  • 78
  • 185