2

I have previously altered the chain_id's of a pdb file, 6gch, resulting in an output that looks likes this:

ATOM 1 N CYS G 1 54.142 90.734 71.584 1.00 8.30 N
ATOM 2 CA CYS G 1 53.264 90.010 72.541 1.00 6.56 C
ATOM 3 C CYS G 1 53.418 90.566 73.962 1.00 7.21 C

Using the code:

from Bio.PDB import PDBList, PDBIO, PDBParser

pdbl = PDBList()

io = PDBIO()
parser = PDBParser()
pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")

# pdb6gch.ent is the filename when retrieved by PDBList
structure = parser.get_structure('6gch', 'pdb6gch.ent')

renames = {
    "E": "A",
    "F": "B",
    "G": "C"
}

for model in structure:
    for chain in model:
        old_name = chain.get_id()
        new_name = renames.get(old_name)
        if new_name:
            print(f"renaming chain {old_name} to {new_name}")
            chain.id = new_name
        else:
            print(f"keeping chain name {old_name}")

io.set_structure(structure)
io.save('6gch_renamed.pdb'

I was wondering if I could replace ATOM entries 1,2 and 3 that have had their chains edited (shown at the start) with the entries from the original 6gch pdb file.

I am still learning how to code so any and all help would be appreciated.

6gch pdb file - https://files.rcsb.org/download/6GCH.pdb

  • I do not understand what you want to accomplish. Changing those 3 entries back to their original chain would mean you have discontinuous chains. These three backbon atoms would appear to be covalently bond to a different chain. How would that help? – Lydia van Dyke Dec 08 '21 at 18:58
  • I am attempting to alter a pdb chain atom entries to stretch my understanding of coding and the pdb file format. There is no functional/scientific use for this, it's just me attempting to get better at coding and to see what is possible and what is not. – Pythonstudent Dec 08 '21 at 22:34
  • Go down to ATOM in your parsing loop model - chain - residue - atom and use atom.serial_number == 1 (or 2 or 3) – pippo1980 Dec 08 '21 at 23:02
  • auch didn't find anyway to change atom chain id !!! b**** !!!!!!!!!!!!! moving them to chain E will corrupt atom number sequences (if E exist or there are chains after the E one) – pippo1980 Dec 09 '21 at 10:52

3 Answers3

2

As I explained in the comment, I was a bit puzzled about this request. The reason being that I cannot see any chemistry related motivation in changing the chain assignment. You answered

I am attempting to alter a pdb chain atom entries to stretch my understanding of coding and the pdb file format. There is no functional/scientific use for this, it's just me attempting to get better at coding and to see what is possible and what is not

My response: I suggest you separate your goals of a) understanding PDB and b) understanding coding. In a very good aproximation, PDB files are just a list of atom coordinates with some hints about covalent atomic bonds. Much more important will be a decent understanding about the chemistry behind all that: What types of bonds are possible and which chemical properties can be derived. For this part, biopython will not help you that much. Much more important will be tools like VMD or PyMOL that will allow you to visualize an play with the proteins described by a PDB file.

An experiment like "change the first 3 atoms" is best done by hand. Biopython tries to assist by modifications that are relatively easy and structurally motivated.

When you got a problem that you need to solve a couple of times (say more than 5 times) then using python + biopython is a possible (and good) way to to it. For this part the answer is: If you can think it, it is possible.

Lydia van Dyke
  • 2,466
  • 3
  • 13
  • 25
0

I've got some code that answer your question, I am sure its not the right way nor the fastest. Had to figure out how Biopythion manage PDB structure object and not sure I got it. I believe every piece of a structure is an object but not sure about relations among them. My code outputs a correct(in term of question not format: 3 atoms are not a Cys residue, (I hope)) pdb file, while the Biopython PDB object structure it creates its kind of wrong: the 3 atoms from Cys 1-A you wanted to be moved don't have a parent. I guess its because of the way I copied the Cys 1-A and detached the atoms only after moving them. Have a look at the code the outputand the printS() of the atoms during my too long alghoritm. I need more time to try to understand it but I feel I am getting a grasp on it, maybe you'll catch it faster than me:

PS I started from your recent question answer here: How do I change the chain name of a pdb file?


from Bio.PDB import PDBList, PDBIO, PDBParser

from Bio.PDB.Chain import Chain

import warnings
warnings.filterwarnings('ignore')




def atom_id_total(struct):
    # print(struct)
    id_t = 0
    for model in struct:
        # print(model)
        for chain in model:
            # print(chain)
            for resi in chain:
                for atom in resi:
                    id_t +=1
                    # print(resi)
    return id_t



pdbl = PDBList()

io = PDBIO()
parser = PDBParser()
pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")

# pdb6gch.ent is the filename when retrieved by PDBList
structure = parser.get_structure('6gch', 'pdb6gch.ent')

renames = {
    "E": "A",
    "F": "B",
    "G": "C"
}

for model in structure:
    for chain in model:
        old_name = chain.get_id()
        new_name = renames.get(old_name)
        if new_name:
            print(f"renaming chain {old_name} to {new_name}")
            chain.id = new_name
        else:
            print(f"keeping chain name {old_name}")

io.set_structure(structure)
io.save('6gch_renamed.pdb')

structure2 = parser.get_structure('6gch_renamed', '6gch_renamed.pdb')

for model in structure2:
    print('model :',model, model.id, model.full_id)


# for model in structure2:
#     for chain in model:
#         if chain.id == 'A' :
#           for residue in chain:
#                 for ii in residue:
            
#                       print(atom,atom.serial_number,atom.get_id(),atom.fullname,atom.get_parent())

my_chain = Chain("E")

print('my_chain : ', my_chain, my_chain.id, my_chain.get_parent())



model_list=[]
for model in structure2.get_models():
    print(model)
    model_list.append(model)
    
model_list[0].add(my_chain)


print('my_chain : ', my_chain, type(my_chain), my_chain.get_full_id(), my_chain.get_parent())



print('structure2 :',structure2)


list_resi =[]
list_atom=[]
for model in structure2:
    for chain in model:
        # if chain.id == 'E':
        print(chain.id)
        # print(dir(chain))
        for residue in chain:
            for atom in residue:
                if atom.serial_number in [1,2,3]:
                    print(atom,atom.serial_number,atom.get_id(),atom.fullname,atom.get_parent())
                    list_resi.append(residue)
                    list_atom.append(atom)

    

list_resi = set(list_resi)                   
print('list_resi : ',list_resi)
print('list_atom : ',list_atom)

cnt_resi=0
for i in list_resi:
    print(i.id, i.full_id)
    copi = i.copy()
    print(copi.id, copi.full_id)

    print('copy', copi.id,copi.get_parent())
    setattr(copi, 'id',(copi.id[0], 1 ,(copi.id[2])))
    
    copi.set_parent(my_chain)
    print('copy parent :' , copi.get_parent())
    print('copy', copi.id, copi.get_parent(),'full_id :', copi.get_full_id())
    cnt_resi += 1
    
    
    copi_child = []
    for i in copi.get_list():
        print(i)
        copi_child.append(i.id)
        
    print('copi child : ',copi_child)
    
    for i in copi_child:
        copi.__delitem__(i)
        
        
    print(copi)
    
    
    for i in copi:
        print(i)

cnt_atm = 0
for i in list_atom:
    # setattr(i, 'serial_number', (atom_id_total(structure2))+300+cnt_atm) #non cambia nulla io.set_structure(structure2) rinomina atomi
    setattr(i, 'serial_number', 1) 
    i.set_parent(copi)
    print('atom i :',i.fullname, i.serial_number, i.id, 'parent :',i.get_parent(), 'atom full_id :',i.get_full_id())    
    copi.add(i)
    cnt_atm += 1
    
print('copi :',copi)
for i in copi:
    print(i, i.serial_number)




my_chain.add(copi)

for model in structure2:
    for chain in model:
        if chain.id == 'A' :
          for residue in chain:
                for ii in residue:
                    print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()).get_parent().id)
        if chain.id == 'E' :
          for residue in chain:
                for ii in residue:
                    print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()))


del_atom = []
for model in structure2:
    for chain in model:
        if chain.id == 'A':
            print(chain.id)
            for residue in chain:
                for atom in residue:
                      if atom in list_atom:
                        del_atom.append(atom)


print('del_atom : ',del_atom)
for i in del_atom:
    print(i, i.serial_number, ((i.get_parent()).get_parent()).id)



for i in list_resi:
    del_i = i
    print('del_i :',del_i)
    for ii in i.get_list():
        if ii in del_atom:
            i.detach_child(ii.id)
            print(ii.serial_number)
            if ii in del_atom:
                print('ok')
           

                         
print('____________')
    
# 

for model in structure2:
    for chain in model:
        if chain.id == 'A' :
          for residue in chain:
                for ii in residue:
                    print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()).get_parent().id)
        if chain.id == 'E' :
          for residue in chain:
                for ii in residue:
                    print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()))


print(structure2.child_dict)

for model in structure2:
    print(model.child_dict)



           
io.set_structure(structure2)
io.save('6gch_re-renamed.pdb')              

the output pdb image below show your half 1A (red) half 1E Cys (blue)

enter image description here

Probably it would have been better to have a look here: https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ

especially "The Structure Object What’s the overall layout of a Structure object?" part before rushing into typing bits of code.

pippo1980
  • 2,181
  • 3
  • 14
  • 30
0

found a shorter way, added at the end of previous code (How do I change the chain name of a pdb file? ) too. Code is long because of all the prints statements to track changes in the structure_object. Not sure if it is the shortest/fastest/more orthodox way to do it. It uses

Chain and Residue (from Bio.PDB.Chain import Chain & from Bio.PDB.Residue import Residue

Only thing I am missing is how to have atoms renumbered in my structure_object without having to save it to pdb file ---> edited now they are renumbered before saving. Have a look at it, let me know if suits your needs:

from Bio.PDB import PDBList, PDBIO, PDBParser

from Bio.PDB.Chain import Chain

from Bio.PDB.Residue import Residue

import warnings
warnings.filterwarnings('ignore')



pdbl = PDBList()

io = PDBIO()
parser = PDBParser()
pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")

# pdb6gch.ent is the filename when retrieved by PDBList
structure = parser.get_structure('6gch', 'pdb6gch.ent')

renames = {
    "E": "A",
    "F": "B",
    "G": "C"
}

for model in structure:
    for chain in model:
        old_name = chain.get_id()
        new_name = renames.get(old_name)
        if new_name:
            print(f"renaming chain {old_name} to {new_name}")
            chain.id = new_name
        else:
            print(f"keeping chain name {old_name}")

io.set_structure(structure)
io.save('6gch_renamed.pdb')

structure2 = parser.get_structure('6gch_renamed', '6gch_renamed.pdb')

# for model in structure2:
#     for chain in model:
#         if chain.id =='A' or chain.id =='E':
#             for residue in chain:
#                 print(residue, residue.get_parent())
#                 for atom in residue:
#                     print(atom, atom.get_parent())


x = Residue((' ',999,' '), 'POP', "") ##see https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ#what-is-a-residue-id

print('new residue X :',x)
                    
str2_atom = structure2.get_atoms()

atoms = []
for i in str2_atom:
    if i.serial_number in [1,2,3]:
        print('1st loop : ',i.serial_number, i.get_full_id())
        atoms.append(i)
        
for i in atoms:
    (i.get_parent()).detach_child(i.id)
    print('2nd loop : ',i.serial_number, i.get_full_id())

print('atoms : ', atoms)


print('detached : __________________________________________')
for model in structure2:
    for chain in model:
        if chain.id =='A' or chain.id =='E':
            for residue in chain:
                print(residue, residue.get_parent())
                for atom in residue:
                    print(atom.serial_number, atom, atom.id, atom.get_parent())


print('before add to new chain : ___________')
for i in atoms:
    # i.set_parent(x)
    print(i.serial_number, i.get_full_id())# i.get_parent(), (i.get_parent()).get_parent())
    
    x.add(i) ## adds atom to residue X ; sets X as i parent

for i in atoms:
    # i.set_parent(x)
    print(i.serial_number, i.get_full_id(), i.get_parent(), (i.get_parent()).get_parent())
my_chain = Chain("E")

print('created new chain : ', my_chain)

my_chain.add(x)

print('after add to new chain : ___________')
for i in atoms:
    #i.set_parent(x)
    print(i.serial_number, i.get_full_id(), i.get_parent(), (i.get_parent()).get_parent())


print('chains of structure model [0] : _______')
print(structure2.child_dict)
for model in structure2:
    print(model.child_dict)

print('add chain E to structure model [0] : _______')
structure2[0].add(my_chain)

print(structure2.child_dict)

for model in structure2:
    print(model.child_dict)

for model in structure2:
    for chain in model:
        if chain.id =='A' or chain.id =='E':
            for residue in chain:
                print(residue, residue.get_parent())
                for atom in residue:
                    print(atom.serial_number, atom, atom.id, atom.get_parent())

# renumber atoms in new structure
atom_N = 1
for model in structure2:
    for chain in model:
        # if chain.id =='A' or chain.id =='E':
            for residue in chain:
                # print(residue, residue.get_parent())
                for atom in residue:
                    # print(atom.serial_number, atom, atom.id, atom.get_parent())
                    setattr(atom, 'serial_number', atom_N)
                    #setattr(copi, 'id',(copi.id[0], 1 ,(copi.id[2])))
                    # print(atom.serial_number, atom, atom.id, atom.get_parent())
                    atom_N += 1
                    
print('\n stucture with renumbered atoms : \n___________________________________')                  
for model in structure2:
    for chain in model:
        if chain.id =='A' or chain.id =='E':
            for residue in chain:
                print(residue, residue.get_parent())
                for atom in residue:
                    print(atom.serial_number, atom, atom.id, atom.get_parent())
        
io.set_structure(structure2)
io.save('6gch_re-renamed.pdb')  
pippo1980
  • 2,181
  • 3
  • 14
  • 30