0

How to parse xml file with many xml BLAST outputs pasted into one file one by one Python. I pasted each blast output into one xml file. They are seperated with:

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" 
"http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">

And each blast output starts with a root:

<BlastOutput>

My code that parse only the first blast result:

with open(xml, 'r') as out_handle:
    blast_records = NCBIXML.parse(out_handle)
    blast_record = next(blast_records)
    eval_tresh = 0.04
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            if hsp.expect < eval_tresh:
                print('***Alignment***')
                print('Sequence:', alignment.title)
                # print('Length:', alignment.length)
                # print('E value:', hsp.expect)

Does anyone know how I could parse those blast results one by one with bio.python?

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I don't think a file containing multiple xml documents is valid XML. You would need to write code to separate it into multiple chunks (e.g., by looking for the ` – larsks May 21 '22 at 15:21
  • Propably not but I wanted to have them all in one file. I actually found the solution. I just made a dirty code that reads row one by one and paste one blast result into the other 'temporary' filethat can be parsed. – Daria G. May 25 '22 at 13:28

0 Answers0