I need to extract single chains from a structure file in cif
format as available from the PDB. I've read several related questions, such as this and this. The proposed solution indeed works well if the chain ID is an integer or a single character. If applied to a structure such as 6KMW to extract chain aA
it raises the error TypeError: %c requires int or char
. Full code used to reproduce the error and output included below.
from Bio.PDB import PDBList, PDBIO, FastMMCIFParser, Select
class ChainSelect(Select):
def __init__(self, chain):
self.chain = chain
def accept_chain(self, chain):
if chain.get_id() == self.chain:
return 1
else:
return 0
pdbl = PDBList()
io = PDBIO()
parser = FastMMCIFParser(QUIET = True)
pdbl.retrieve_pdb_file('6kmw', pdir = '.', file_format='mmCif')
structure = parser.get_structure('6kmw', '6kmw.cif')
io.set_structure(structure)
io.save('6kmw_aA.pdb', ChainSelect('aA'))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-095b98a12800> in <module>
18 structure = parser.get_structure('6kmw', '6kmw.cif')
19 io.set_structure(structure)
---> 20 io.save('6kmw_aA.pdb', ChainSelect('aA'))
~/miniconda3/envs/lab2/lib/python3.8/site-packages/Bio/PDB/PDBIO.py in save(self, file, select, write_end, preserve_atom_numbering)
368 )
369
--> 370 s = get_atom_line(
371 atom,
372 hetfield,
~/miniconda3/envs/lab2/lib/python3.8/site-packages/Bio/PDB/PDBIO.py in _get_atom_line(self, atom, hetfield, segid, atom_number, resname, resseq, icode, chain_id, charge)
227 charge,
228 )
--> 229 return _ATOM_FORMAT_STRING % args
230
231 else:
TypeError: %c requires int or char
Is anyone aware of a Biopython functionality to achieve the result? Preferably one that doesn't rely on parsing the entire file by custom functions.