I'm trying to write a Python3 script that performs a global alignment of two sequences, of ~ 10 kb and 11 kb length. Both are very similar to each other. (I'm trying to find the few points where they do not match, one of which I know is close to the end of the query sequence, which is why a local alignment is not sufficient (BLASTN just crops the last 2 bases to get a better alignment score).)
I had hoped to be able to use BioPython for this, specifically the Align.PairwiseAligner. However, I'm getting a MemoryError.
Is there a way to configure or use BioPython to use less memory (more runtime is completely fine - within reason, of course)? Or any other way to perform a global alignment of 2 sequences that I can use without having to resort to external software like Emboss Needle?
(This is part of a larger software. I already have to ship BLAST and BioPython along. I would prefer not to have to ship yet another software, if it can be avoided.)
Here's an MVCE:
from Bio import Align
ref_seq = "GAAAGACCTGAAAGATCACGGTGCCTTCATTTCAACTGTGAGACATGAAGTAATTTTCCCAAATCTACAACATTAAGATATGGTGCAATAAGGACCAGATTAAAGGTCTCCTGATTTGCGGCCATGTTCCCTCCATCTCCTTTACTCCTAAACACACTCACACTCACTACTGCAAATAGTTGTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAACTTGTGAACACATCACTGACCAGCACAGAGCTGGCTACAATAGCTCCCCAATTAAGGTGTTTTACATGCAACTGGTTCAAACCTTCCAAGTGCTAAATTAAAACAATCCTTTAAAGAAGGAAATTCTGTTTCAGAAGAGGACCTTCATACAGCATCTCTGACCAGCAACTGATGATGCTATTGAACTCAGATGCTGATTGGTTCTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAGGGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTAAGCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATCCTAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCATTGGATCCCAAATTATTTCCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTTCTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAATAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTACTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCATAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAAACTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTTAACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAATGTATAAATTATATGAAATTCTATTTTTTAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACTTTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAAATGACTCTTAATAAAAAAATTTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTGAAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTTATTATTTATTTACCTATTTATTTGTTTATTTATTTTATTTTATTTTGTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTTGTATTTTTAGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTTGCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAGCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGAGAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGTAGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGGCTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAAATGAATCCGGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAACATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATTGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAATAATGTGAGCTGCAGTTATAGGGATCAAAAAAGTTACAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAATTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTTCCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTATATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTTGACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATATCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCTGTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCCGTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTTCGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAAGAAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTATATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTATTTGTAATTGCTTTGAATTATTTTGACCTATTTATTGGCGGATTATAATTATTGCTCTAAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAACTTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAATAGTTGTATTTATTGAGAGTATGCTAGTGTTGGGGATTTTCCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAATTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCTGAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTGAAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTACAAGAGGTTTCTAAATGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTCTCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCACTTTTAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTGCTTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAATGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCGGCAACTGCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGCCCTTGGCCTGAAGTCCCAGCATTGATGACAGCGCCTCATCTTCAACTTTT"
query_seq = "GAGTGAGACTTGCCTGCTAAGCTGGCCCCTGCTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATCCTAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCATTGGATCCCAAATTATTTCCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTTCTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAATAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTACTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCATAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAAACTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTTAACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAATGTATAAATTATATGAAATTCTATTTTTTAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACTTTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAAATGACTCTTAATAAAAAAATTTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTGAAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTTATTATTTATTTACCTATTTATTTGTTTATTTATTTTATTTTATTTTGTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTTGTATTTTTAGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTTGCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAGCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGAGAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGTAGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGGCTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAAATGAATCCGGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAACATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATTGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAATAATGTGAGCTGCAGTTATAGGGATCAAAAAAGTTACAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAATTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTTCCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTATATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTTGACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATATCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCTGTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCCGTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTTCGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAAGAAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTATATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTATTTGTAATTGCTTTGAATTATTTTGACCTATTTATTGGCGGATTATAATTATTGCTCTAAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAACTTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAATAGTTGTATTTATTGAGAGTATGCTAGTGTTGGGGATTTTCCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAATTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCTGAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTGAAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTACAAGAGGTTTCTAAATGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTCTCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCACTTTTAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTGCTTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAATGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGGA"
aligner = Align.PairwiseAligner()
aligner.match_score = 2
aligner.mismatch_score = -3
aligner.open_gap_score = -5
aligner.extend_gap_score = -2
aligner.query_right_open_gap_score = 0
aligner.query_left_open_gap_score = 0
aligner.query_right_extend_gap_score = 0
aligner.query_left_extend_gap_score = 0
alignments = aligner.align(ref_seq, query_seq) # <= this is where the MemoryError occurs
for a in sorted(alignments):
print(a)
This is Python 3 on Windows 10. The Biopython version is 1.72.
Edit: and here's the Traceback:
Traceback (most recent call last):
File "C:/foo/temp.py", line 15, in <module>
alignments = aligner.align(ref_seq, query_seq) # <= this is where the MemoryError occurs
File "C:\Program Files (x86)\Python36\lib\site-packages\Bio\Align\__init__.py", line 1352, in align
score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB)
MemoryError: Out of memory