-1

I have created a sequence alignment tool to compare two strands of DNA (X and Y) to find the best alignment of substrings from X and Y. The algorithm is summarized here (https://en.wikipedia.org/wiki/Smith–Waterman_algorithm). I have been able to generate a lists of lists, filling them all with zeros, to represent my matrix. I created a scoring algorithm to return a numerical score for each kind of alignment between bases (eg. plus 4 for a match). Then I created an alignment algorithm that should put a score in each coordinate of my "matrix". However, when I go to print the matrix, it only returns the original with all zeros (rather than actual scores).

I know there are other methods of implementing this method (with numpy for example), so could you please tell me why this specific code (below) does not work? Is there a way to modify it, so that it does work?

code:

def zeros(X: int, Y: int):
    lenX = len(X) + 1
    lenY = len(Y) + 1
    matrix = []
    for i in range(lenX):
        matrix.append([0] * lenY)
    
    def score(X, Y):
        if X[n] == Y[m]: return 4
        if X[n] == '-' or Y[m] == '-': return -4
        else: return -2

    def SmithWaterman(X, Y, score):
        for n in range(1, len(X) + 1):
            for m in range(1, len(Y) + 1):
                align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
                indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
                indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
        matrix[n, m] = max(align, indelX, indelY, 0)
    print(matrix)

zeros("ACGT", "ACGT")

output:

[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

1 Answers1

1

The reason it's just printing out the zeroed out matrix is that the SmithWaterman function is never called, so the matrix is never updated.

You would need to do something like

# ...
SmithWaterman(X, Y, score)
print(matrix)
# ...

However, If you do this, you will find that this code is actually quite broken in many other ways. I've gone through and annotated some of the syntax errors and other issues with the code:

def zeros(X: int, Y: int):
    #        ^       ^  incorrect type annotations. should be str
    lenX = len(X) + 1
    lenY = len(Y) + 1
    matrix = []
    for i in range(lenX):
        matrix.append([0] * lenY)
    # A more "pythonic" way of expressing the above would be:
    # matrix = [[0] * len(Y) + 1 for _ in range(len(x) + 1)]
    
    def score(X, Y):
        #     ^  ^ shadowing variables from outer scope. this is not a bug per se but it's considered bad practice
        if X[n] == Y[m]: return 4
        #    ^       ^  variables not defined in scope
        if X[n] == '-' or Y[m] == '-': return -4
        #    ^              ^  variables not defined in scope
        else: return -2

    def SmithWaterman(X, Y, score): # this function is never called
        #                   ^ unnecessary function passed as parameter. function is defined in scope
        for n in range(1, len(X) + 1):
            for m in range(1, len(Y) + 1):
                align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
                #                 ^ invalid list lookup. should be: matrix[n-1][m-1]
                indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
                #                                          ^ out of bounds error when m == len(Y)
                indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
                #                                  ^ out of bounds error when n == len(X)
        matrix[n, m] = max(align, indelX, indelY, 0)
        # this should be nested in the inner for-loop. m, n, indelX, and indelY are not defined in scope here
    print(matrix)

zeros("ACGT", "ACGT")
Dharman
  • 30,962
  • 25
  • 85
  • 135
George D.
  • 255
  • 2
  • 10
  • I fixed it using your comments and added print(*matrix, sep = "\n") instead, so it reads in the more familiar format of a matrix (each list on a new line). Of course this is the less ideal way of doing this, so I also went ahead and created it using NumPy. – LampoonBonanza Mar 01 '21 at 22:43