2

I want to substituted capital letter B with C in column 5 and from line 6 to the end of the file, need to keep the spaces as it is from my original input file as it is.

ATOM   1939  HG2 PRO A 125      35.681  32.906  38.437  1.00 43.59           H  
ATOM   1940  HG3 PRO A 125      34.593  33.765  37.652  1.00 41.79           H  
ATOM   1941  HD2 PRO A 125      37.364  34.075  37.624  1.00 43.38           H  
ATOM   1942  HD3 PRO A 125      36.333  34.312  36.415  1.00 41.29           H  
TER   
ATOM   1944  N   MET B  11      16.583  29.975  -4.306  1.00 51.32           N  
ATOM   1945  CA  MET B  11      15.542  30.263  -3.327  1.00 39.92           C  
ATOM   1946  C   MET B  11      16.146  30.366  -1.933  1.00 32.50           C  

I have read:

  1. https://unix.stackexchange.com/questions/486840/replace-a-string-with-sed-from-specific-lines
  2. https://unix.stackexchange.com/questions/70878/replacing-string-based-on-line-number
  3. Sed replace pattern with line number

and my attempt is: awk 'NR == 6 && $ == 5, { sub(" B ", " C ") }'

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
Another.Chemist
  • 2,386
  • 3
  • 29
  • 43

5 Answers5

4

This simple awk should help you in same. Written and tested in GNU awk.

awk '
FNR>=6 && match($0,/^(\S+[[:space:]]+)(\S+[[:space:]]+)(\S+[[:space:]]+)(\S+[[:space:]]+)(\S+)(.*)$/,arr) && arr[5]=="B"{
  $0=arr[1] arr[2] arr[3] arr[4] "C" arr[6]
}
1
'  Input_file

Using match function here to keep your spaces as it is even after substitution.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
3

You can use any awk and preserve the format by using a sub() on the entire record. This prevents awk from recalculating the fields. For instance, you can do:

awk 'NR > 5 { sub(/MET B/,"MET C") }1' file

Here awk will replace the first occurrence of "MET B" with "MET C" beginning with the 6th record leaving the spacing alone.

Output

ATOM   1939  HG2 PRO A 125      35.681  32.906  38.437  1.00 43.59           H
ATOM   1940  HG3 PRO A 125      34.593  33.765  37.652  1.00 41.79           H
ATOM   1941  HD2 PRO A 125      37.364  34.075  37.624  1.00 43.38           H
ATOM   1942  HD3 PRO A 125      36.333  34.312  36.415  1.00 41.29           H
TER
ATOM   1944  N   MET C  11      16.583  29.975  -4.306  1.00 51.32           N
ATOM   1945  CA  MET C  11      15.542  30.263  -3.327  1.00 39.92           C
ATOM   1946  C   MET C  11      16.146  30.366  -1.933  1.00 32.50           C
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
1

Use NR >= 6 to update all lines starting with line 6. Your code does it on line 6 only.

Put $5 in the sub() call to make the substitution only in that field.

awk 'NR >= 6 {sub("B", "C", $5)} 1'

I hope you don't care about the number of spaces between the columns. The line that are updated will have multiple spaces collapsed into one, so

ATOM   1944  N   MET B  11      16.583  29.975  -4.306  1.00 51.32           N  

becomes

ATOM 1944 N MET C 11 16.583 29.975 -4.306 1.00 51.32 N
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

If you can make use of gawk, you could use split which will have a seps array that stores the separators.

You can split on the field separator FS, and then use the number returned by split to loop through all the fields.

When encountering field 5 and the value is B, then change it to C

awk 'NR > 5 {
  nr = split($0, a, FS, seps)
  for (i = 1; i <= nr; ++i) {
    if (i == 5 && a[i] == "B") a[i] = "C" 
    printf "%s%s", a[i], seps[i]
  }
  printf "\n"
  next
}1' file

Output

ATOM   1939  HG2 PRO A 125      35.681  32.906  38.437  1.00 43.59           H
ATOM   1940  HG3 PRO A 125      34.593  33.765  37.652  1.00 41.79           H
ATOM   1941  HD2 PRO A 125      37.364  34.075  37.624  1.00 43.38           H
ATOM   1942  HD3 PRO A 125      36.333  34.312  36.415  1.00 41.29           H
TER
ATOM   1944  N   MET C  11      16.583  29.975  -4.306  1.00 51.32           N
ATOM   1945  CA  MET C  11      15.542  30.263  -3.327  1.00 39.92           C
ATOM   1946  C   MET C  11      16.146  30.366  -1.933  1.00 32.50           C
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

here's a way to preserve all the spaces and tabs you like without using vendor-proprietary solutions :

 mawk 'BEGIN {   _ = length(FS="[ \t]+") } NR<_ || NF<_ ||
              $!NF = sprintf("%.*sC%s", (__ = index($!_,
                             $_))- !!_, $!_, substr($!_, ++__))'
ATOM   1939  HG2 PRO A 125      35.681  32.906  38.437  1.00 43.59           H  
ATOM   1940  HG3 PRO A 125      34.593  33.765  37.652  1.00 41.79           H  
ATOM   1941  HD2 PRO A 125      37.364  34.075  37.624  1.00 43.38           H  
ATOM   1942  HD3 PRO A 125      36.333  34.312  36.415  1.00 41.29           H  
TER   
ATOM   1944  N   MET C  11      16.583  29.975  -4.306  1.00 51.32           N  
ATOM   1945  CA  MET C  11      15.542  30.263  -3.327  1.00 39.92           C  
ATOM   1946  C   MET C  11      16.146  30.366  -1.933  1.00 32.50           C  
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11