Remove heteroatoms from PDB

Question

The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

Any suggestion?

xbello · Answer 1 · 2014-09-09T08:49:43.820

The heteroatoms shouldn't be part of the chain. But you can know if a residue is a heteroatom with:

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")

for residue in pdb.get_residues():
    tags = residue.get_full_id()

    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)

    if tags[3][0] != " ":
        # The residue is a heteroatom
    else:
        # It is not

You can also get the id of the residue (without the three first fields) with:

tags = residue.id

# or het_flag,_ ,_ = residue.id

if tags[0] != " ":
    # The residue is a heteroatom
else:
    # It is not

I'm adding a link to the relevant documentation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf

The subject is in the page 8, "What is a residue id?". Quoting:

This is a bit more complicated, due to the clumsy PDB format. A residue id is a tuple with three elements:

The hetero-flag: this is ’H_’ plus the name of the hetero-residue (eg. ’H_GLC’ in the case of a glucose molecule), or ’W’ in the case of a water molecule.

To add comments in and resume:

from Bio.PDB import PDBParser, PDBIO, Select

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if residue.id[0] == " " else 0

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())

Thanks for the post. This detects the heteroatoms and I also want to remove this from the chain. Is there a smart way to remove hetero atoms? — Exchhattu, Sep 09 '14 at 01:24
This is a new question on its own. With my answer and the example in the biopdb_faq.pdf (page 5-6 "Can I write PDB files?"), change the selection code from `if residue.get_name()=="GLY":` to `if residue.id[0] == " ":`. — xbello, Sep 09 '14 at 06:45
I am testing. I will let you know once I finish the testing. — Exchhattu, Sep 11 '14 at 03:59

score 0 · Answer 2 · answered Mar 18 '16 at 21:35

I once used the code "Removing residues" from http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython

It will miss some heteroatoms. I guess it may because that the chain changes every time the detach_child is called.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

After modified as below (just avoid dynamically modifying the iterable), it worked fine for me. (I only used structure[0] here.)

model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
    for residue in chain:
        if residue.id[0] != ' ':
            residue_to_remove.append((chain.id, residue.id))
    if len(chain) == 0:
        chain_to_remove.append(chain.id)

for residue in residue_to_remove:
    model[residue[0]].detach_child(residue[1])

for chain in chain_to_remove:
    model.detach_child(chain)

Remove heteroatoms from PDB

2 Answers2

Linked