Longest common subsequence -- Optimizing memory

Question

I have a question on optimizing memory for the common dynamic programming task: finding the longest common subsequence of two strings. I found a response to a similar question which has the response

Note that when you're calculating the next row of the table in the dynamic programming solution to solve the LCS problem, you only need the previous row and your current row. Then you can modify the dynamic programming solution to keep track of only the previous row and the current row instead of the m x n table. Every time you reach the end of the current row, you set the previous row to the current row, and start from the beginning of the row again. You do this m times where m is the number of rows in your table. This will use space linear in the number of columns.

But I am left with two questions.

First, when you use set the previous row as your new one, won't you still have the values within the old row? Won't those affect the results?

Second, why can't you do the same optimization with the columns also? That is, when you reach the end of a column set the previous column to the current column?

Bas Swinckels · Accepted Answer · 2014-09-15T06:25:26.257

Have look at the wikipedia page about this problem, especially the figures with the tables. To calculate the result for the cell in row i and column j, you need the previous results of the cells on the left, top and top-left of the current cell, so LCS(i, j) = some_function(LCS(i-1, j), LCS(i, j-1), LCS(i-1, j-1)). To calculate the result for all the cells (which you need to do even to get the final answer), it is therefore easier to calculate all the intermediate results in stripes along rows or columns.

To answer your questions:

It is safe to re-use row i-2 to store the new result for row i, since you are simply overwriting the old results from the left to the right. The only information you need for calculating cell i, j from the current row is cell i, j-1, which you just updated with the new value in the previous step.
There are various ways that you could fill the whole table to finally get the result of the cell on the bottom-right corner. You could fill the table row-by-row (the standard), column-by-column, alternating one column, then one row, or along a diagonal front. Just try yourself with pen and paper: draw a grid and fill the whole grid with crosses, but only mark cells for which the relevant neighbors are already marked. It is just simpler to implement the row-by-row or column-by-column version, since you only need 2 vectors of fixed length to hold the intermediate results.

Longest common subsequence -- Optimizing memory

1 Answers1