I want to do continuous renumbering a pdb file having multiple chains(A,H,L). Some of the chains have insertion codes attached to the residue position (e.g., 190A etc.). Can anybody help me how to write this code?
Example of pdb file with insertion

- 45
- 5
-
have a look here: https://stackoverflow.com/questions/70279008/how-to-replace-pdb-atom-entries-with-an-altered-pdb-file-that-just-contains-atom/70311342#70311342 ---> first answer look into the code for the # renumber atoms in new structure block of code – pippo1980 Mar 10 '22 at 17:29
-
so that 29,29A ---> 1,1A ? – pippo1980 Mar 10 '22 at 18:11
1 Answers
my attempt using Biopython:
input file : testA.pdb
:
ATOM 25 N ALA E 5 48.087 97.950 74.514 1.00 9.33 N
ATOM 26 CA ALA E 5 48.052 99.292 73.904 1.00 9.37 C
ATOM 27 C ALA E 5 47.483 100.285 74.935 1.00 9.65 C
ATOM 28 O ALA E 5 47.693 101.493 74.908 1.00 9.11 O
ATOM 29 CB ALA E 5 47.247 99.339 72.623 1.00 8.31 C
ATOM 30 N ILE E 6 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 6 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 6 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 6 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 6 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 6 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 6 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 6 42.926 100.408 74.814 1.00 11.29 C
ATOM 30 N ILE E 6A 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 6A 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 6A 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 6A 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 6A 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 6A 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 6A 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 6A 42.926 100.408 74.814 1.00 11.29 C
ATOM 38 N GLN E 7 47.184 100.177 79.159 1.00 10.08 N
ATOM 39 CA GLN E 7 47.750 99.648 80.383 1.00 10.85 C
ATOM 40 C GLN E 7 46.749 99.311 81.476 1.00 10.94 C
ATOM 41 O GLN E 7 45.812 100.068 81.762 1.00 10.33 O
ATOM 42 CB GLN E 7 48.855 100.550 80.962 1.00 11.19 C
ATOM 43 CG GLN E 7 50.227 100.292 80.353 1.00 11.71 C
ATOM 44 CD GLN E 7 50.656 101.322 79.346 1.00 12.04 C
ATOM 45 OE1 GLN E 7 50.015 101.625 78.348 1.00 11.94 O
ATOM 46 NE2 GLN E 7 51.811 101.943 79.591 1.00 12.40 N
ATOM 47 N PRO E 8 46.990 98.145 82.066 1.00 11.13 N
ATOM 48 CA PRO E 8 46.204 97.689 83.212 1.00 11.66 C
ATOM 49 C PRO E 8 46.688 98.594 84.352 1.00 11.77 C
ATOM 50 O PRO E 8 47.885 98.899 84.409 1.00 11.72 O
ATOM 51 CB PRO E 8 46.586 96.236 83.432 1.00 11.66 C
ATOM 52 CG PRO E 8 47.935 96.031 82.787 1.00 11.65 C
ATOM 53 CD PRO E 8 48.114 97.207 81.829 1.00 11.20 C
my code :
from Bio.PDB import PDBIO, PDBParser
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
# to work with some non orthodox pdbs
import warnings
warnings.filterwarnings('ignore')
io = PDBIO()
parser = PDBParser()
# my_pdb_structure = parser.get_structure('test', 'test.pdb')
my_pdb_structure = parser.get_structure('test', 'testA.pdb')
print(my_pdb_structure)
# renumber residue in my_pdb_structure
residue_N = 1
for model in my_pdb_structure:
for chain in model:
for residue in chain:
print(residue.id)
if 'A' in residue.id[2]:
residue.id = (residue.id[0], residue_N-1, residue.id[2])
print('----',residue.id)
else:
residue.id = (residue.id[0], residue_N, residue.id[2])
print('----',residue.id)
residue_N += 1
# this bit just print the renumbered my_pdb_structure
print('\n stucture with renumbered atoms : \n___________________________________')
for model in my_pdb_structure:
for chain in model:
for residue in chain:
print(residue, residue.id)
io.set_structure(my_pdb_structure)
# io.save('renumbered.pdb')
io.save('renumberedA.pdb', preserve_atom_numbering=True)
output renumberedA.pdb
:
ATOM 25 N ALA E 1 48.087 97.950 74.514 1.00 9.33 N
ATOM 26 CA ALA E 1 48.052 99.292 73.904 1.00 9.37 C
ATOM 27 C ALA E 1 47.483 100.285 74.935 1.00 9.65 C
ATOM 28 O ALA E 1 47.693 101.493 74.908 1.00 9.11 O
ATOM 29 CB ALA E 1 47.247 99.339 72.623 1.00 8.31 C
ATOM 30 N ILE E 2 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 2 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 2 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 2 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 2 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 2 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 2 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 2 42.926 100.408 74.814 1.00 11.29 C
ATOM 30 N ILE E 2A 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 2A 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 2A 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 2A 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 2A 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 2A 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 2A 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 2A 42.926 100.408 74.814 1.00 11.29 C
ATOM 38 N GLN E 3 47.184 100.177 79.159 1.00 10.08 N
ATOM 39 CA GLN E 3 47.750 99.648 80.383 1.00 10.85 C
ATOM 40 C GLN E 3 46.749 99.311 81.476 1.00 10.94 C
ATOM 41 O GLN E 3 45.812 100.068 81.762 1.00 10.33 O
ATOM 42 CB GLN E 3 48.855 100.550 80.962 1.00 11.19 C
ATOM 43 CG GLN E 3 50.227 100.292 80.353 1.00 11.71 C
ATOM 44 CD GLN E 3 50.656 101.322 79.346 1.00 12.04 C
ATOM 45 OE1 GLN E 3 50.015 101.625 78.348 1.00 11.94 O
ATOM 46 NE2 GLN E 3 51.811 101.943 79.591 1.00 12.40 N
ATOM 47 N PRO E 4 46.990 98.145 82.066 1.00 11.13 N
ATOM 48 CA PRO E 4 46.204 97.689 83.212 1.00 11.66 C
ATOM 49 C PRO E 4 46.688 98.594 84.352 1.00 11.77 C
ATOM 50 O PRO E 4 47.885 98.899 84.409 1.00 11.72 O
ATOM 51 CB PRO E 4 46.586 96.236 83.432 1.00 11.66 C
ATOM 52 CG PRO E 4 47.935 96.031 82.787 1.00 11.65 C
ATOM 53 CD PRO E 4 48.114 97.207 81.829 1.00 11.20 C
TER 53 PRO E 4
END
the code just load the pdb file throught PDBParser() and loop over the pdb structure object changing id of residues starting from 1 and adding +1 on each loop, then saves the renubered_structure through PDBIO() (first set the structure then saves it).
I dont know the inners of Biopython for PDB parsing and PDB structure objects, my code works only on your test input, that is the residue having an A is always after the one same residue without the A, you can run tests with different input pdbs to check it out
As per your comment and my input above you can get this output:
ATOM 25 N ALA E 1 48.087 97.950 74.514 1.00 9.33 N
ATOM 26 CA ALA E 1 48.052 99.292 73.904 1.00 9.37 C
ATOM 27 C ALA E 1 47.483 100.285 74.935 1.00 9.65 C
ATOM 28 O ALA E 1 47.693 101.493 74.908 1.00 9.11 O
ATOM 29 CB ALA E 1 47.247 99.339 72.623 1.00 8.31 C
ATOM 30 N ILE E 2 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 2 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 2 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 2 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 2 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 2 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 2 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 2 42.926 100.408 74.814 1.00 11.29 C
ATOM 30 N ILE E 3 46.802 99.657 75.862 1.00 9.99 N
ATOM 31 CA ILE E 3 46.118 100.279 77.004 1.00 10.34 C
ATOM 32 C ILE E 3 46.521 99.491 78.253 1.00 10.35 C
ATOM 33 O ILE E 3 46.292 98.274 78.348 1.00 9.61 O
ATOM 34 CB ILE E 3 44.613 100.230 76.772 1.00 11.05 C
ATOM 35 CG1 ILE E 3 44.269 100.841 75.413 1.00 11.39 C
ATOM 36 CG2 ILE E 3 43.845 100.913 77.879 1.00 11.06 C
ATOM 37 CD1 ILE E 3 42.926 100.408 74.814 1.00 11.29 C
ATOM 38 N GLN E 4 47.184 100.177 79.159 1.00 10.08 N
ATOM 39 CA GLN E 4 47.750 99.648 80.383 1.00 10.85 C
ATOM 40 C GLN E 4 46.749 99.311 81.476 1.00 10.94 C
ATOM 41 O GLN E 4 45.812 100.068 81.762 1.00 10.33 O
ATOM 42 CB GLN E 4 48.855 100.550 80.962 1.00 11.19 C
ATOM 43 CG GLN E 4 50.227 100.292 80.353 1.00 11.71 C
ATOM 44 CD GLN E 4 50.656 101.322 79.346 1.00 12.04 C
ATOM 45 OE1 GLN E 4 50.015 101.625 78.348 1.00 11.94 O
ATOM 46 NE2 GLN E 4 51.811 101.943 79.591 1.00 12.40 N
ATOM 47 N PRO E 5 46.990 98.145 82.066 1.00 11.13 N
ATOM 48 CA PRO E 5 46.204 97.689 83.212 1.00 11.66 C
ATOM 49 C PRO E 5 46.688 98.594 84.352 1.00 11.77 C
ATOM 50 O PRO E 5 47.885 98.899 84.409 1.00 11.72 O
ATOM 51 CB PRO E 5 46.586 96.236 83.432 1.00 11.66 C
ATOM 52 CG PRO E 5 47.935 96.031 82.787 1.00 11.65 C
ATOM 53 CD PRO E 5 48.114 97.207 81.829 1.00 11.20 C
TER 53 PRO E 5
END
just changing the renumbering block of code, with the next one:
# renumber residue in my_pdb_structure
residue_N = 1
for model in my_pdb_structure:
for chain in model:
for residue in chain:
print(residue.id)
residue.id = (residue.id[0], residue_N, " ")
print('----',residue.id)
residue_N += 1
this will renumber all residues starting counting from 1 and deletig all the As or other letter in the pdb

- 2,181
- 3
- 14
- 30
-
Thank you for your comment! The renumber I want to o is in continuation. Which means the 6A in your example should become 3 instead of 2A. Can you please help me with that? – Radz Mar 11 '22 at 05:10
-
@Radz I updated my answer, dont forget to accept it if it suits your needs and helped you solve your problem, I am sure there are a lot of other ways (i.e. using other parsers or building your ones) – pippo1980 Mar 11 '22 at 15:11