I have created a sequence alignment tool to compare two strands of DNA (X and Y) to find the best alignment of substrings from X and Y. The algorithm is summarized here (https://en.wikipedia.org/wiki/Smith–Waterman_algorithm). I have been able to generate a lists of lists, filling them all with zeros, to represent my matrix. I created a scoring algorithm to return a numerical score for each kind of alignment between bases (eg. plus 4 for a match). Then I created an alignment algorithm that should put a score in each coordinate of my "matrix". However, when I go to print the matrix, it only returns the original with all zeros (rather than actual scores).
I know there are other methods of implementing this method (with numpy for example), so could you please tell me why this specific code (below) does not work? Is there a way to modify it, so that it does work?
code:
def zeros(X: int, Y: int):
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
def score(X, Y):
if X[n] == Y[m]: return 4
if X[n] == '-' or Y[m] == '-': return -4
else: return -2
def SmithWaterman(X, Y, score):
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
matrix[n, m] = max(align, indelX, indelY, 0)
print(matrix)
zeros("ACGT", "ACGT")
output:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]