Biopython: export the protein fragment from PDB to a FASTA file

Question

I am writing the PDB protein sequence fragment to fasta format as below.

from Bio.SeqIO import PdbIO, FastaIO

def get_fasta(pdb_file, fasta_file, transfer_ids=None):
    fasta_writer = FastaIO.FastaWriter(fasta_file)
    fasta_writer.write_header()
    for rec in PdbIO.PdbSeqresIterator(pdb_file):
        if len(rec.seq) == 0:
            continue
        if transfer_ids is not None and rec.id not in transfer_ids:
            continue
        print(rec.id, rec.seq, len(rec.seq))
        fasta_writer.write_record(rec)

get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

It gives the following error

AttributeError                            Traceback (most recent call last)
<ipython-input-9-8ecf92753ac9> in <module>
     12         fasta_writer.write_record(rec)
     13 
---> 14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])
     15 get_fasta(open('pdb1olg.ent'), open('1olg.fasta', 'w'), transfer_ids=['1OLG:B'])
     16 get_fasta(open('pdb1ycq.ent'), open('1ycq.fasta', 'w'), transfer_ids=['1YCQ:B'])

<ipython-input-9-8ecf92753ac9> in get_fasta(pdb_file, fasta_file, transfer_ids)
     10             continue
     11         print(rec.id, rec.seq, len(rec.seq))
---> 12         fasta_writer.write_record(rec)
     13 
     14 get_fasta(open('pdb1tup.ent'), open('1tup.fasta', 'w'), transfer_ids=['1TUP:B'])

~/anaconda3/envs/bioinformatics/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py in write_record(self, record)
    303     def write_record(self, record):
    304         """Write a single Fasta record to the file."""
--> 305         assert self._header_written
    306         assert not self._footer_written
    307         self._record_written = True

AttributeError: 'FastaWriter' object has no attribute '_header_written'

I search around and checked this, this, and this but could not resolve the issue. The complete code is here where the issue is in last cell.

Edit: I am using

conda version : 4.8.3
conda-build version : 3.18.11
python version : 3.7.6.final.0
biopython version : 1.77.dev0

Interesting problem. I cannot reproduce it on my machine. The code is working fine using Python 3.6.9 and biopython==1.76. By looking at Biopython's source, I see no chance how field _header_written could be _not_ present. Which biopython version are you using? — Lydia van Dyke, May 24 '20 at 10:12
@LydiavanDyke I am using biopython==1.77dev0. The issue with 1.76 is that the SwissProt fatureTable format is changed which is updated in 1.77Dev0. — Dr. Abrar, May 24 '20 at 17:26
I see. At the moment my best guess is: A bug appeared between 1.76 and 1.77dev. I suggest you try to reproduce the error with 1.76. If it disappears with the older version, I would suggest to file a bug report against biopython. — Lydia van Dyke, May 24 '20 at 18:51
@LydiavanDyke thank you, you were right, it works fine in biopython 1.76. I reported an issue on the github. — Dr. Abrar, May 29 '20 at 17:53
Thank you for taking the effort of reporting the bug. Good luck with your project and happy coding :) — Lydia van Dyke, May 29 '20 at 23:03

score 0 · Answer 1 · answered Sep 09 '20 at 07:34

I'm not sure about fasta_writer which I don't use, but you can just store the string sequences you need into a list or dict and then manually write them to fasta:

## with list
data = '>'+'\n>'.join([f'{i}\n{seq}' for i, seq in enumerate(seq_list)])+'\n'
## or with dict
data = '>'+'\n>'.join([f'{name}\n{seq}' for name, seq in seq_dict.iteritems()])+'\n' 

with open('path/to/my-fasta-file.fasta', 'wt') as f:
    f.write(data)

(new line at end of data is only necessary if this is all part of a larger loop where you write batches of seq_list into the same fasta file)

Thank you. It was a bug and was fixed by the concerned team on GitHub. The new version is compatible with both options — Dr. Abrar, Sep 11 '20 at 16:22

score 0 · Answer 2 · answered Oct 12 '21 at 20:06

0

You can do this with Biopython

from Bio import SeqIO
pdbfile = '2tbv.pdb'
with open(pdbfile) as handle:
    sequence = next(SeqIO.parse(handle, "pdb-atom"))
with open("2tbv.fasta", "w") as output_handle:
    SeqIO.write(sequence, output_handle, "fasta")

answered Oct 12 '21 at 20:06

Jacob Stern

3,758
3
32
54

Thank you. It was a bug and was fixed by the concerned team on GitHub. The new version is compatible with both options – Dr. Abrar Oct 18 '21 at 11:09

Biopython: export the protein fragment from PDB to a FASTA file

2 Answers2