1

I wanted to do the sequence search 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT' against the pig nucleotide genome sequences and see if I can find the perdect match in the alignment. I used the biopython to access the ncbi blast and fetch the result, which is a _io.StringIO object. I wanted to read that xml file, however is looks different than what I see in the actual ncbi blast tool in the web. Could you please help me with this?

The script I used does not give hits however, has alignment hits in the ncbi blast tool.

seq2 = 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT'
from Bio.Blast import NCBIWWW as ncbi
result1 = ncbi.qblast("blastn", "nr", seq2,entrez_query = 'pig (taxid:9823)')

#print(result1) gives <_io.StringIO object at 0x7f89f80a1790>

#tried to open the file #I used the script below from Biopython:overview of blast, however I see no output, there is no hit in #the alignment

with open('results.xml', 'w') as save_file: 
    blast_results = result1.read() 
    save_file.write(blast_results)

from Bio.Blast import NCBIXML

E_VALUE_THRESH = 1e-20
for record in NCBIXML.parse(open("results.xml")): 
    if record.alignments: 
        print("\n") 
        print("query: %s" % record.query[:100]) 
        for align in record.alignments: 
           for hsp in align.hsps: 
              if hsp.expect < E_VALUE_THRESH: 
                 print("match: %s " % align.title[:100])

#I tried using the script below form the stack overflow as well

result_handle = ncbi.qblast("blastn", "nr", seq2, entrez_query= "pig (taxid:9823)")

records = NCBIXML.parse(result_handle)

for i, record in enumerate(records):
    if record.alignments:
        for align in record.alignments:
            print(align.hit_id)
    else:
        print("There is no BLAST result for", i)

#I used this script from Biopython:overview of blast, however I see no output, there is no hit in #the alignment

#using the blast tool in ncbi gives some sequence alignments though.

type here
Krazykroz
  • 11
  • 1

1 Answers1

0

The issue happens to be with the entrez_query format.

from Bio.Blast import NCBIWWW as ncbi
from Bio.Blast import NCBIXML

seq2 = 'CCTTCATTCTTCTGTATTGGAGACTTACAGTTGGCACAAGGCTTGGAGTT'
result1 = ncbi.qblast("blastn", "nr", seq2, entrez_query='txid9823[ORGN]')

blast_records = NCBIXML.parse(result1)
records_list = []
for record in blast_records:
    records_list.append(record)

The correct entrez_query format for pig: entrez_query='txid9823[ORGN]'

I've ran this code and got a total of 35 alignments in the one Bio.Blast.Record that was returned. The rest of your code should work now that you have results.

Marcelo Paco
  • 2,732
  • 4
  • 9
  • 26