1

I am writing a script that renumbers protein structures (CIF files) and then saves them (PDB files: Biopython does not have a CIF saving function).

For most of the files I use, it works. But for files like 6ek0.pdb, 5t2c.pdb, and 4v6x.pdb I keep getting the same TypeError for the same line of the io.save function. The error also is there when I do not renumber the file, only have input and output like this:

from Bio import PDB

io = PDB.PDBIO()
pdb_parser = PDB.MMCIFParser()
pdbfile = '/Users/jbibbe/Documents/2018Masterstage_2/Scripts_part2/PDBfiles/5t2c.cif'
structure = pdb_parser.get_structure(' ', pdbfile)
io.set_structure(structure)
io.save(pdbfile[:-4] + '_test.pdb')

The error is:

Traceback (most recent call last):
  File "/Users/jbibbe/Documents/2018Masterstage_2/Scripts_part2/testerfile.py", line 8, in <module>
    io.save(pdbfile[:-4] + '_test.pdb')
  File "/Users/jbibbe/anaconda2/lib/python2.7/site-packages/Bio/PDB/PDBIO.py", line 222, in save
    resseq, icode, chain_id)
  File "/Users/jbibbe/anaconda2/lib/python2.7/site-packages/Bio/PDB/PDBIO.py", line 112, in _get_atom_line
    return _ATOM_FORMAT_STRING % args
TypeError: %c requires int or char

I looked at the code and the atom properties, but I could not see what was wrong with the type of the atom properties. Most of the parts in the atom_format_string are checked thoroughly by Biopython, so I would assume their types were right.

I hope you can help me. If I can do something to improve this question, please indicate (I am new here).

Edit: To be clear, what I want to do is

  1. understand what went wrong
  2. save the structure
Janne B
  • 13
  • 5

1 Answers1

1

The error is triggered when BioPython tries to write two-letter chain name using %c format in _ATOM_FORMAT_STRING.

More generally, big structures like 5T2C (ribosome) cannot be written in the traditional PDB format. Many programs and libraries support two-character chain names (written in columns 21-22), but the standard is to have a single-character chain name in column 22. Then you need some extension of atom numbering to support more than 99,999 atoms - the most popular one is hybrid-36.

Anyway, BioPython does not support big PDB files.

(if you write what exactly you want to do someone may be able to suggest another solution)

marcin
  • 3,351
  • 1
  • 29
  • 33
  • This is most of what I needed (I just said that in an edit of my question too). Now I understand what caused the error, I can just rename chain Lf to chain A and save only that chain, as I only need this part of the ribosome. And I mailed the creator of Biopython asking for a CIF saving tool :-) – Janne B May 29 '18 at 12:49