0

I'm using the wrapped version of smith-waterman from skbio (0.5.4), but i have an unspected error:

_, score, _ = local_pairwise_align_ssw(protein_list[idx1], protein_list[idx2], substitution_matrix = blosum62)
File "/anaconda3/lib/python3.6/site-packages/skbio/alignment/_pairwise.py", line 732, in local_pairwise_align_ssw
validate=False)
File "/anaconda3/lib/python3.6/site-packages/skbio/alignment /_tabular_msa.py", line 785, in __init__
reset_index=minter is None and index is None)
File "/anaconda3/lib/python3.6/site-packages/skbio/alignment /_tabular_msa.py", line 1956, in extend
self._assert_valid_sequences(sequences)
File "/anaconda3/lib/python3.6/site-packages/skbio/alignment /_tabular_msa.py", line 2035, in _assert_valid_sequences
% (length, expected_length))
ValueError: Each sequence's length must match the number of positions in the MSA: 232 != 231

The weird thing is that sometimes the error appears with protein pair 0-10, and others with 0-116. So, i don't believe it's an error from protein fromat.

cw_cw
  • 1
  • 2

1 Answers1

1

I have a similar problem. However, I was able to limit the error to the optimized SSW version. So no error in the sequence formatting.

import warnings
from skbio.sequence import Protein
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", message="...")
    from Bio.Align import substitution_matrices
from skbio.alignment import local_pairwise_align_ssw
from skbio.alignment import local_pairwise_align

peptide1 = Protein("CGAGDNQAGTALIF")
peptide2 = Protein("CAGEEGGGADGLTF")
gap_open_penalty = 10
gap_extend_penalty = 10
substitution_matrix = substitution_matrices.load("BLOSUM45")

## works correct
rv = local_pairwise_align_ssw(
      sequence1 = peptide1
    , sequence2 = peptide2
    , gap_open_penalty=1
    , gap_extend_penalty=1
    , substitution_matrix=substitution_matrix
)
print(rv)

## but if I swap peptide1 and peptide 2 the ValueError occur
rv = local_pairwise_align_ssw(
      sequence1 = peptide2
    , sequence2 = peptide1
    , gap_open_penalty=1
    , gap_extend_penalty=1
    , substitution_matrix=substitution_matrix
)
print(rv)

## if I do the same with local_pairwise_align it works!
rv = local_pairwise_align(
      seq1=peptide2
    , seq2=peptide1
    , gap_open_penalty=1
    , gap_extend_penalty=1
    , substitution_matrix=substitution_matrix
)
print(rv)
cpeikert
  • 31
  • 5