I'm wondering how Bio.PDB identifies a residue as a hetero-residue. I know that the residue.id method returns a tuple in which the first item is the hetero flag, the second one is the residue identifier (number) and the third one is the insertion code.
But how does the internal code decide what to put in the hetero flag field? Does it check whether the atoms in the residue are HETATM records vs. ATOM records?
Or does it check the atom names in each residue and compare it to some set of hetero-atoms?
The reason I ask is because in 4MDH chain B, the first residue in the coordinates section is ACE (acetyl). It has only C and O atoms, and the PDB file lists it as a HETATM. But when the residue.id for this residue is (' ', 0, ' ').
Here is my code:
>>> from Bio.PDB.mmtf import MMTFParser
>>> structure = MMTFParser.get_structure_from_url('4mdh')
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 0.
PDBConstructionWarning)
/Library/Python/2.7/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 0.
PDBConstructionWarning)
>>> chain = [c for c in structure.get_chains() if c.get_id() == 'B'][0]
>>> residue0 = [r for r in chain.get_residues() if r.id[1] == 0][0]
>>> residue0.id
(' ', 0, ' ')
>>>