0

I'm very new to python but i've been using it to extract the sequence of a gene from a genbank file. The issue is is that sometimes i'll get the output i want (prints the sequence to a file) and sometimes it will return a key error. This depends on which accession i'm using. Does anyone know why it sometimes might give a key error? I thought it might be something to do with the genbank records themselves, but they look pretty similar and the gene is there (in the gene feature qualifier). EG works with HG738867.1 but not AP019703.1. Here's my code -

from Bio import Entrez, SeqIO

gi_genome = 'accession'
name = 'acrA'
Entrez.email = 'email'
handle = Entrez.efetch(db="nucleotide", id=gi_genome, rettype="gbwithparts", retmode="text")
record = SeqIO.read(handle, "gb")
handle.close()
element = 0
for feature in record.features:
    if feature.type == 'CDS' and name in feature.qualifiers["gene"]:
        report = 'record.features[%s]' % str(element)
        gene_sequence = feature.extract(record.seq)
        with open('output.fasta', 'a') as f:
            print('>' + gi_genome + ' ' + name, file=f)
            print(gene_sequence, file=f)
        break
    else:
        element = element + 1

Here's the traceback -

Traceback (most recent call last):
  File "/home/ubuntu/Documents/Git_Branches/Project_planning/Learning/In_progress/utils/data.py", line 11, in <module>
    if feature.type == 'CDS' and name in feature.qualifiers["gene"]:
KeyError: 'gene'

Process finished with exit code 1

Thanks in advance!

donna
  • 1
  • 2
  • Welcome to SO and thank you for good question. Error message with traceback would be helpful. And yes, it probably is data-related problem. But there is way to improve your code a bit to get rid of those errors. – ex4 Mar 01 '21 at 17:20
  • try crosspost with https://bioinformatics.stackexchange.com/ and add tag biopython there – pippo1980 Mar 01 '21 at 17:44
  • Thanks for the advice! Have added the traceback and crosspotsed. – donna Mar 02 '21 at 07:09
  • @donna Did you solve it ? 'creD' gives the error with both genomes, both miss reference to ebi.ac.uk/QuickGO/ in the gene feature description ..... not sure – pippo1980 May 04 '21 at 18:08
  • Hi @pippo1980 i did. I took your suggestion and posted it to the bioinformatics stack exchange and someone posted an answer that worked - https://bioinformatics.stackexchange.com/a/15456/10648 – donna May 06 '21 at 05:31

0 Answers0