awk: change a field's value conditionally based on the value of another column

Question

I have a table snp150Common.txt, where the second and third fields $2 and $3 can be equal or not.

If they are equal, I want $2 to become $2-1, so that:

chr1    10177   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10352   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

becomes:

chr1    10176   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10351   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

My current command adapted from https://askubuntu.com/a/312843:

zcat < snp150/snp150Common.txt.gz | head | awk '{ if ($2 == $3) $2=$2-1; print $0 }' | cut -f 2,3,4,5,8,9,10,12,16

gives the same output:

chr1    10177   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10352   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

Any help is greatly appreciated.

Your `cut` (which is pretty foo as you are already using awk) indicates that the `snp150Common.txt` has more columns than you show above. Are you sure `$2` and `$3` are the columns you really want to compare? — James Brown, May 07 '18 at 03:31

score 1 · Accepted Answer · answered May 07 '18 at 04:52

This answer is based on pure speculation of the source file format:

$ zcat snp150/snp150Common.txt.gz | 
  awk '
  BEGIN { OFS="\t" }                       # field separators are most likely tabs
  {
      if ($3 == $4)                        # based on cut these should be compared
          $3=$3-1
      print $2,$3,$4,$5,$8,$9,$10,$12,$16  # ... and there fields printed
  }
  NR==10 { exit }'                         # this replaces head

And remember: Practising (anything but sucking) makes you suck less.

awk: change a field's value conditionally based on the value of another column

1 Answers1