I am creating a code that takes accession numbers from an existing excel file and ultimately running it through BLAST using Entrez. I am very new to Python and pretty much just learning as a I write. Here is what my code looks like right now, it currently will run and run but never complete and never return any errors. Any help would be greatly appreciated!
import pandas
from Bio import SeqIO
from Bio import Entrez
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
genesexcel=pandas.read_excel(r'C:\Users\bmn45\OneDrive - Drexel University\Monoterpenoid Indole Alkaloids\MIA_genes.xlsb', sheet_name='Data S16')
#import the excel into variable
acc_num=genesexcel['Accession number'].values
#store values from first column of excel
print(acc_num)
for number in acc_num:
handle = Entrez.esearch(db='protein', term=number, retmax="20")
#searching the Genbank database using the accession number
Entrez_result = Entrez.read(handle)
ID_List = Entrez_result['IdList']
#storing the ID number(s) from the Entrez search of the accession number
handle2 = Entrez.efetch(db='protein', id=ID_List, rettype='gb')
#uses efecth to get the full sequence record/file using the searched ID number
#currently, the ID value is hardcoded, database is protein and the return file type (rettype) is set as a genbank file(gb)
seq_record = SeqIO.read(handle2, "genbank")
#storing the full sequence record as a genbank file
seq_ID=(seq_record.name)
#storing the name of the sequence record (the full sequence ID)
result_handle = NCBIWWW.qblast("tblastn", "nt", seq_ID, entrez_query="txid4056[ORGN]")
#searching the nucleotide database ("nt") from protein sequence using tblastn, refined search related to Apocynacae (taxID4056)
with open("my_blast_result.xml", "w") as out_handle:
out_handle.write(result_handle.read())
result_handle.close()
#saving the blast results into a file
result_handle = open("my_blast_result.xml")
#loading saved BLAST results
blast_records = NCBIXML.parse(result_handle)
#parsing (analying) through the returned hits
print("Alignments for sequence", next(blast_records).query)
for alignment in blast_records:
print("Accession number:", alignment.accession)
print("Sequence:", alignment.title)
print("Length:", alignment.length)
print()
print(result_handle.read())
I have tried running the code with a hard-coded accession number and ID (without looping through) and it worked just fine so I know my problem lies with how my loop is set up.