How does Bio.PDB identify hetero-residues?

Question

I'm wondering how Bio.PDB identifies a residue as a hetero-residue. I know that the residue.id method returns a tuple in which the first item is the hetero flag, the second one is the residue identifier (number) and the third one is the insertion code.

But how does the internal code decide what to put in the hetero flag field? Does it check whether the atoms in the residue are HETATM records vs. ATOM records?

Or does it check the atom names in each residue and compare it to some set of hetero-atoms?

The reason I ask is because in 4MDH chain B, the first residue in the coordinates section is ACE (acetyl). It has only C and O atoms, and the PDB file lists it as a HETATM. But when the residue.id for this residue is (' ', 0, ' ').

Here is my code:

>>> from Bio.PDB.mmtf import MMTFParser
>>> structure = MMTFParser.get_structure_from_url('4mdh')
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 0.
  PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 0.
  PDBConstructionWarning)
>>> chain = [c for c in structure.get_chains() if c.get_id() == 'B'][0]
>>> residue0 = [r for r in chain.get_residues() if r.id[1] == 0][0]
>>> residue0.id
(' ', 0, ' ')
>>>

score 2 · Answer 1 · answered May 10 '18 at 22:34

TL;DR: It's not BioPython but the mmtf library which does the interpretation.

From the source code:

self.structure_bulder.init_residue(group_name, self.this_type,
                                   group_number, insertion_code)

Here the residue is created. The 2nd parameter (self.this_type) is the field/hetero flag in init_residue

def init_residue(self, resname, field, resseq, icode):
    """Create a new Residue object.
    Arguments:
     - resname - string, e.g. "ASN"
     - field - hetero flag, "W" for waters, "H" for hetero residues, otherwise blank.

In the mmtfParser this_type is set for the whole chain in set_chain_info.

If you import the same sequence with mmtf, you can see that chain 0 and 1 are considered to be polymers which is interpreted as a 'regular` atom by BioPython. That makes sense since the acetate group is bound to the peptide chain.

from mmtf import fetch
decoded_data = fetch("4mdh")
print(decoded_data.entity_list)

[{'chainIndexList': [0, 1],
  'description': 'CYTOPLASMIC MALATE DEHYDROGENASE',
  'sequence': 'XSE...SSA',
  'type': 'polymer'},
 {'chainIndexList': [2, 4],
  'description': 'SULFATE ION',
  'sequence': '',
  'type': 'non-polymer'},
 {'chainIndexList': [3, 5],
  'description': 'NICOTINAMIDE-ADENINE-DINUCLEOTIDE',
  'sequence': '',
  'type': 'non-polymer'},
 {'chainIndexList': [6, 7],
  'description': 'water',
  'sequence': '',
  'type': 'water'}]

Note you can access models, chains and residues in BioPython by indexes, e.g. structure[0]['B'][0] would give you the same atom as in the question.

How does Bio.PDB identify hetero-residues?

1 Answers1