1

I want to rename the chains of a PDB file '6gch' - https://www.rcsb.org/structure/6GCH.

I have checked the Biopython manual and can't seem to find anything. Any input would be of great help!

1 Answers1

3

You can indeed just change the id attribute of the chain elements. After that you can use PDBIO to save the modified structure.

Note however, that this process modifies the PDB quite a bit. PDBIO does not store entries like REMARKs, SHEETs and SSBONDs. If you know that you need those, you must be careful. Also this process moves the HETATMs at the end of the corresponding chain while the original PDB had them located at the end of the file.

As 6GCH has 3 chains, I am using the dictionary renames to configure the mapping of old to new chain name. If a chain name is not included in this dict, no renaming will be done.

from Bio.PDB import PDBList, PDBIO, PDBParser

pdbl = PDBList()

io = PDBIO()
parser = PDBParser()
pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")

# pdb6gch.ent is the filename when retrieved by PDBList
structure = parser.get_structure('6gch', 'pdb6gch.ent')

renames = {
    "E": "A",
    "F": "B",
    "G": "C"
}

for model in structure:
    for chain in model:
        old_name = chain.get_id()
        new_name = renames.get(old_name)
        if new_name:
            print(f"renaming chain {old_name} to {new_name}")
            chain.id = new_name
        else:
            print(f"keeping chain name {old_name}")

io.set_structure(structure)
io.save('6gch_renamed.pdb')
Lydia van Dyke
  • 2,466
  • 3
  • 13
  • 25
  • Thank you so much! I was wondering if there was a functionally in Biopython that would enable swapping chain_id "E" to "G" and "G" to "E" even though the id's are already used for sibling identity. – Pythonstudent Dec 07 '21 at 12:29
  • 1
    As the for-loop is iterating over each chain exactly once, you should just need to set `renames = {"E": "G", "G": "E"}` to do a swap of chain names. – Lydia van Dyke Dec 07 '21 at 13:09
  • When I try the code I get the following error: [ValueError: Cannot change id from `E` to `G`. The id `G` is already used for a sibling of this entity.] – Pythonstudent Dec 07 '21 at 15:01
  • 1
    That's a pity. A clumsy workaround would be to use intermediate chain names: `renames_list = [{"E": "g", "G": "e"}, {"g": "G", "e": "E"}]` and then place a `for renames in renames_list:` infront of the model-loop. – Lydia van Dyke Dec 07 '21 at 15:34
  • 1
    What also seems to work (but I just eyeballed the resulting PBD) is to add the line `chain.parent = None` in the inner for-loop. This disables the check for already used ids. – Lydia van Dyke Dec 07 '21 at 15:39
  • why pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb") retrieve .ent instead of pdb while file_format=xml retrieve .xml ? – pippo1980 Dec 07 '21 at 17:33
  • Once again thanks for answering my question above. I was wondering if there was a way to replace the 6gch pdb file atom entries with the newely edited atom and chain entries so that the resulting pdb file contains all the 6gch information in addition to the chain editions. – Pythonstudent Dec 08 '21 at 16:57
  • @LydiavanDyke where does the check for already used id occours in PDBIO.save or chain.id = 'xx' ? – pippo1980 Aug 08 '22 at 15:57
  • Is it here : https://github.com/biopython/biopython/blob/master/Bio/PDB/Entity.py#L163 ?? – pippo1980 Aug 08 '22 at 16:26
  • @pippo1980 yes, that seems to be the place. The check is in line 175 and the following line raises the same ValueError as mentioned by you in December :) – Lydia van Dyke Aug 08 '22 at 19:35