2

The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

Any suggestion?

gboffi
  • 22,939
  • 8
  • 54
  • 85
Exchhattu
  • 197
  • 3
  • 15

2 Answers2

4

The heteroatoms shouldn't be part of the chain. But you can know if a residue is a heteroatom with:

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")

for residue in pdb.get_residues():
    tags = residue.get_full_id()

    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))
    # Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)

    # Thus you're interested in the Hetero Field, that is empty if the residue
    # is not a hetero atom or have some flag if it is (W for waters, H, etc.)

    if tags[3][0] != " ":
        # The residue is a heteroatom
    else:
        # It is not

You can also get the id of the residue (without the three first fields) with:

tags = residue.id

# or het_flag,_ ,_ = residue.id

if tags[0] != " ":
    # The residue is a heteroatom
else:
    # It is not

I'm adding a link to the relevant documentation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf

The subject is in the page 8, "What is a residue id?". Quoting:

This is a bit more complicated, due to the clumsy PDB format. A residue id is a tuple with three elements:

  • The hetero-flag: this is ’H_’ plus the name of the hetero-residue (eg. ’H_GLC’ in the case of a glucose molecule), or ’W’ in the case of a water molecule.

To add comments in and resume:

from Bio.PDB import PDBParser, PDBIO, Select

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if residue.id[0] == " " else 0

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())
xbello
  • 7,223
  • 3
  • 28
  • 41
  • Thanks for the post. This detects the heteroatoms and I also want to remove this from the chain. Is there a smart way to remove hetero atoms? – Exchhattu Sep 09 '14 at 01:24
  • This is a new question on its own. With my answer and the example in the biopdb_faq.pdf (page 5-6 "Can I write PDB files?"), change the selection code from `if residue.get_name()=="GLY":` to `if residue.id[0] == " ":`. – xbello Sep 09 '14 at 06:45
  • I am testing. I will let you know once I finish the testing. – Exchhattu Sep 11 '14 at 03:59
0

I once used the code "Removing residues" from http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython

It will miss some heteroatoms. I guess it may because that the chain changes every time the detach_child is called.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)
        if len(chain) == 0:
            model.detach_child(chain.id)

After modified as below (just avoid dynamically modifying the iterable), it worked fine for me. (I only used structure[0] here.)

model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
    for residue in chain:
        if residue.id[0] != ' ':
            residue_to_remove.append((chain.id, residue.id))
    if len(chain) == 0:
        chain_to_remove.append(chain.id)

for residue in residue_to_remove:
    model[residue[0]].detach_child(residue[1])

for chain in chain_to_remove:
    model.detach_child(chain)
Young
  • 631
  • 6
  • 6