0

I need to do an exact match followed by a partial match and retrieve the strings from two columns. I would ideally like to do this with awk.

Input:

k141_18046_1    k141_18046_1
k141_18046_1    k141_18046_2
k141_18046_2    k141_18046_1
k141_12033_1    k141_18046_2
k141_12033_1    k141_12033_1
k141_12033_2    k141_12033_2
k141_2012_1     k141_2012_1
k141_2012_1     k141_2012_2
k141_2012_2     k141_2012_1
k141_21_1     k141_2012_2
k141_21_1       k141_21_1
k141_21_2       k141_21_2

Expected output:

k141_18046_1    k141_18046_2
k141_18046_2    k141_18046_1
k141_2012_1     k141_2012_2
k141_2012_2     k141_2012_1

In both columns, the first part of the ID is the same. I need to get the IDs where either ID_1 && ID_2 (OR) ID_2 && ID_1 are present in a single row.

Thank you, Susheel

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Susheel Busi
  • 163
  • 8

1 Answers1

1

Updated based on comments:

$ awk '
$1!=$2 {                     # consider only unequal strings
    n=split($1,a,/_/)        # split them by undescored
    m=split($2,b,/_/)
    if(m==n) {               # there should be equal amount of parts
        for(i=1;i<n;i++)  
            if(a[i]!=b[i])   # all but last parts should equal
                next         # or not valid
    } else
        next
    print                    # if you made it so far...
}' file

Output:

k141_18046_1    k141_18046_2
k141_18046_2    k141_18046_1
k141_2012_1     k141_2012_2
k141_2012_2     k141_2012_1

Another awk, using match()

$ awk '
substr($1,match($1,/^.*_/),RLENGTH) == substr($2,match($2,/^.*_/),RLENGTH) && 
substr($1,match($1,/[^_]*$/),RLENGTH) != substr($2,match($2,/[^_]*$/),RLENGTH)
' file
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • Sorry, my bad.. the input file has other IDs matching that were not defined earlier, I'll update my input description. Terribly sorry! – Susheel Busi Jul 06 '20 at 11:05
  • Sorry, but I edited my original sample description. I also have values like this: ```k141_13612_2 k141_5573_1``` and ```k141_13612_2 k141_19887_1``` – Susheel Busi Jul 06 '20 at 11:12