I'm somewhat familiar with Biopython's pairwise2 function but I noticed that it adds dashes within the sequence in order to obtain the best possible alignment score. For example,
for a in pairwise2.align.globalxx("ACCGT", "ACG"):
print(format_alignment(*a))
would yield this result:
ACCGT
|||||
A-CG-
Score=3
<BLANKLINE>
ACCGT
|||||
AC-G-
Score=3
<BLANKLINE>
Even though the first 2 characters (A & C) in the 2nd sequence would align with the 1st sequence. Is there a way to find the number of aligned base pairs and not the highest number of aligned base pairs (e.g.: a sequence of ACTGAA would have a score of 3 against a sequence of GCCGTA)?