difficult to interpret .xml result file from Biopython ncbi blast function

Question

I wanted to do the sequence search 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT' against the pig nucleotide genome sequences and see if I can find the perdect match in the alignment. I used the biopython to access the ncbi blast and fetch the result, which is a _io.StringIO object. I wanted to read that xml file, however is looks different than what I see in the actual ncbi blast tool in the web. Could you please help me with this?

The script I used does not give hits however, has alignment hits in the ncbi blast tool.

seq2 = 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT'
from Bio.Blast import NCBIWWW as ncbi
result1 = ncbi.qblast("blastn", "nr", seq2,entrez_query = 'pig (taxid:9823)')

#print(result1) gives <_io.StringIO object at 0x7f89f80a1790>

#tried to open the file #I used the script below from Biopython:overview of blast, however I see no output, there is no hit in #the alignment

with open('results.xml', 'w') as save_file: 
    blast_results = result1.read() 
    save_file.write(blast_results)

from Bio.Blast import NCBIXML

E_VALUE_THRESH = 1e-20
for record in NCBIXML.parse(open("results.xml")): 
    if record.alignments: 
        print("\n") 
        print("query: %s" % record.query[:100]) 
        for align in record.alignments: 
           for hsp in align.hsps: 
              if hsp.expect < E_VALUE_THRESH: 
                 print("match: %s " % align.title[:100])

#I tried using the script below form the stack overflow as well

result_handle = ncbi.qblast("blastn", "nr", seq2, entrez_query= "pig (taxid:9823)")

records = NCBIXML.parse(result_handle)

for i, record in enumerate(records):
    if record.alignments:
        for align in record.alignments:
            print(align.hit_id)
    else:
        print("There is no BLAST result for", i)

#I used this script from Biopython:overview of blast, however I see no output, there is no hit in #the alignment

#using the blast tool in ncbi gives some sequence alignments though.

type here

score 0 · Answer 1 · answered Mar 02 '23 at 19:26

The issue happens to be with the entrez_query format.

from Bio.Blast import NCBIWWW as ncbi
from Bio.Blast import NCBIXML

seq2 = 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT'
result1 = ncbi.qblast("blastn", "nr", seq2, entrez_query='txid9823[ORGN]')

blast_records = NCBIXML.parse(result1)
records_list = []
for record in blast_records:
    records_list.append(record)

The correct entrez_query format for pig: entrez_query='txid9823[ORGN]'

I've ran this code and got a total of 35 alignments in the one Bio.Blast.Record that was returned. The rest of your code should work now that you have results.

difficult to interpret .xml result file from Biopython ncbi blast function

1 Answers1