gawk/sed: find a line and replace the 3rd column

Question

I have a file:

rs4648841   chr1    2365885 --  A   T   0.40095 0.228978043022122   chr1:2523811
rs4648843   chr1    2366316 --  T   C   0.15694 0.5736208829426915  chr1:2523811
rs61763906  chr1    2366517 --  A   G   0.07726 0.5566728930776897  chr1:2523811

I need to find a "rs4648843" in the first column and once I found a line containing it I want to edit a 4th column in this line to "ADS" (using sed, gawk doesn't matter)

tried: (but of course it did not work)

sed '/rs4648843/p' input | sed 's//ADD/g'

EDIT: I need no new file to be created, I want to edit a file I have already.

What have you tried so far? Also, just to clarify, since `rs12123` never occurs in the first column in your sample input, you want nothing changed in the corresponding sample output? — John1024, Apr 06 '15 at 19:42
wrt `EDIT: I need no new file to be created, I want to edit a file I have already.` - that is not possible with sed or awk and is extraordinarily difficult to accomplish in UNIX (hint: you need to use `dd`). Why is that a requirement? — Ed Morton, Apr 06 '15 at 20:59
@EdMorton, `ed` can edit files programmatically without creating temp files. — glenn jackman, Apr 06 '15 at 21:06
Hmm, well, yeah, kinda. `ed` uses a buffer which is the size of the original file so it's no different than using a temp file. It's like reading the whole input file into an array in awk, then closing the input file, and then processing the array writing the contents back to the original file. Yeah you can do it but it doesn't save you any memory or have any other benefits over a tmp file. — Ed Morton, Apr 06 '15 at 22:28

score 1 · Answer 1 · answered Apr 06 '15 at 19:56

Try this:

awk '/^rs4648843/ {$4="ADS"}1' file | column -t

Output:

rs4648841   chr1  2365885  --   A  T  0.40095  0.228978043022122   chr1:2523811
rs4648843   chr1  2366316  ADS  T  C  0.15694  0.5736208829426915  chr1:2523811
rs61763906  chr1  2366517  --   A  G  0.07726  0.5566728930776897  chr1:2523811

John1024 · Answer 2 · 2015-04-06T21:39:43.687

Using awk

Assuming that your input file is tab-separated:

$ awk -v OFS="\t" '$1=="rs4648843"{$4="ADS"} 1' file
rs4648841       chr1    2365885 --      A       T       0.40095 0.228978043022122       chr1:2523811
rs4648843       chr1    2366316 ADS     T       C       0.15694 0.5736208829426915      chr1:2523811
rs61763906      chr1    2366517 --      A       G       0.07726 0.5566728930776897      chr1:2523811

To change the existing file:

awk -v OFS="\t" '$1=="rs4648843"{$4="ADS"} 1' file >file.tmp && mv file.tmp file

Using sed

Again, assuming tab-separated input, to change the file in-place:

sed -i -r '/^rs4648843/ {s/(([^\t]*\t){3})[^\t]+/\1ADS/}' file

The above was test on GNU sed. For OSX (BSD) sed, try:

sed -i .bak -E '/^rs4648843/ {s/(([^\t]*\t){3})[^\t]+/\1ADS/;}' file

Using awk but passing in the `rs...` value as a variable

awk -v rs="rs4648843" -v OFS="\t" '$1==rs{$4="ADS"} 1' file

Using sed With a String With a Slash

As per the comments, suppose that, instead of ADS, we want to substitute in TRAF6-RAG1/2. Since this contains a / character, it will confuse the sed command given above. There are two possible solutions: one is to escape the / with a backslash. This works as follows:

sed -r '/^rs4648843/ {s/(([^\t]*\t){3})[^\t]+/\1TRAF6-RAG1\/2/}' file

The other solution is to use a different marker for the substitution command. sed's substitution commands are often written in the form s/old/new/ but other markers besides / are possible. As an example, the following use a vertical bar, |, as the marker instead of /, and thus accommodates the new string:

sed -r '/^rs4648843/ {s|(([^\t]*\t){3})[^\t]+|\1TRAF6-RAG1/2|}' file

It is possible to pass "rs4648843" to awk as a variable? I am programing in ruby and I have an array consisting of "rs222",... and I need to pass each variable to awk — Alina, Apr 06 '15 at 20:31
@Tonja OK. I added a version that uses awk's `-v` option to pass in the `rs...` value as a variable. — John1024, Apr 06 '15 at 20:37
it seems that sed for GNU throws an error if I have instead of "ADS" a "TRAF6-RAG1/2" — Alina, Apr 06 '15 at 21:18
@Tonja I updated the answer with two solutions for that. The problem is that, with sed, one has to be careful anytime one of the substitution strings has a sed-active character which, in that case, is the slash: `/`. — John1024, Apr 06 '15 at 21:42
@Tonja why are you calling awk from Ruby? Doesn't Ruby have it's own language/functionality for manipulating text? — Ed Morton, Apr 07 '15 at 02:46

glenn jackman · Answer 3 · 2015-04-06T21:16:27.077

Assuming your data is, as it appears to be, fixed width:

gawk -v item=rs4648843 '
    BEGIN {
        FIELDWIDTHS="12 8 8 4 4 4 8 20 12"
        OFS=""
        pattern = "^"item"\\\>" 
    }
    $1 ~ pattern {$4 = sprintf("%-4s", "ADS")} 
    1
' file

rs4648841   chr1    2365885 --  A   T   0.40095 0.228978043022122   chr1:2523811
rs4648843   chr1    2366316 ADS T   C   0.15694 0.5736208829426915  chr1:2523811
rs61763906  chr1    2366517 --  A   G   0.07726 0.5566728930776897  chr1:2523811

To edit the file in-place, we can fall back on ed:

rs="rs4648843"
ed file <<END_ED
g/^$rs\>/ s/^\(\([^[:blank:]]\+[[:blank:]]\+\)\{3\}\)[^[:blank:]]\+/\1ADS/
w
q
END_ED

The lengthy regular expression captures the first 3 whitespace-separated words, and replaces the 4th with "ADS"

gawk/sed: find a line and replace the 3rd column

3 Answers3

Using awk

Using sed

Using awk but passing in the rs... value as a variable

Using sed With a String With a Slash

Using awk but passing in the `rs...` value as a variable