Recursion backtracking interview question

Question

I was asked this interview question on recursion & backtracking the other day and have not found a feasible solution, here is the question:

Given a grid and a word, write a function that returns the location of the word in the grid as a list of coordinates. If there are multiple matches, return any one.

grid1 = [
    ['c','c','x','t','i','b'], 
    ['c','c','a','t','n','i'],
    ['a','c','n','n','t','t'],
    ['t','c','s','i','p','t'],
    ['a','o','o','o','a','a'],
    ['o','a','a','a','o','o'],
    ['k','a','i','c','k','i']
]
word1 = 'catnip'
word2 = 'cccc'
word3 = 's'
word4 = 'bit'
word5 = 'aoi'
word6 = 'ki'
word7 = 'aaa'
word8 = 'ooo'

grid2 = [
    ['a']
]
word9 = 'a'

The desired solutions are shown below:

find_word_location(grid1, word1) => [(1,1), (1,2),(1,3),(2,3),(3,3),(3,4)] 
find_word_location(grid1, word2) => 
    [(0,1),(1,1),(2,1),(3,1)]
 OR [(0,0),(1,0),(1,1),(2,1)] 
 OR [(0,0),(0,1),(1,1),(2,1)]
 OR [(1,0),(1,1),(2,1),(3,1)] 
find_word_location(grid1,word3) => [(3,2)]
find_word_location(grid1,word4) => [(0,5),(1,5),(2,5)]
find_word_location(grid1,word5) => [(4,5),(5,5),(6,5)] 
find_word_location(grid1,word6) => [(6,4),(6,5)] 
find_word_location(grid1,word7) => [(5,1),(5,2),(5,3)]
find_word_location(grid1,word8) => [(4,1),(4,2),(4,3)] 

find_word_location(grid2,word9) => [(0,0)]

I was also asked about the complexity analysis with the following variables:

r = number of rows
c = number of columns
w = length of the word

It would also be wise to mention that only letters either to the right or below the letter being evaluated are considered, if those letters do not correspond to the next letter in the word there must be some backtracking process that occurs.

I have a Python function but it does not produce the desired output:

coords = [] 
width = len(grid[0])  # 6
height = len(grid)    # 7
def find_word(i , row, col): 
    if i == len(word): 
        return True

    if word[i] == grid[row][col]: 
        coords.append((row, col))
        if col + 1 < width and find_word( i + 1,row, col + 1):
            return True
        elif row + 1 < height and find_word(i+1, row+1, col): 
            return True
        coords.pop() # neither the right or below element provided the next character in word
        return False
    
    if col + 1< width and find_word(i, row,col+1): 
        return True 
    elif row +1 < height and find_word(i, row+1, 0):
        return True

i = 0   # the word index
row = 0 # row index 
col = 0 # column index
find_word(0, 0, 0)

Any help would be much appreciated.

If the letter at (row, col) for "i == 0" is the first letter of the word but the word can't be found there, your function returns False although there are other candidate positions for the first letter. — Michael Butscher, Feb 02 '22 at 21:17
The character at grid1[4][5] is supposed to be an 'a' not a 'o', the word would always be in the grid. Thanks for the comment Micheal. — Amundeep Singh Dhaliwal, Feb 05 '22 at 16:28

BrokenBenchmark · Answer 1 · 2022-02-05T17:56:18.487

This is a dynamic programming problem. The best indication of this is that for any given letter in the grid, you can only consider its neighbor to its right or below it. This gives an evaluation order for solving subproblems.

The comments address your immediate bug (i.e. you're not considering all possible starting spots). However, even if you fix this issue, it's going to be difficult to analyze the time complexity of your code. You could probably work out the number of recursive calls given enough time, but in an interview setting, you don't want to have to worry about such things. (As an upper bound, you would be dealing with up to 2**w choices of right / down movements for each of r * c potential starting spots, giving a time complexity of O(r * c * 2**w). See more discussion at this question.)

This approach determines whether a given grid element can generate a suffix of the word. We use a memoization table to store a True/False value -- suffix_table[row][col][word_index] is True if we can move down or to the right to generate the suffix word[word_index:], False otherwise.

We can use iteration rather than recursion as well as an explicit memoization table. This isn't the cleanest code in the world, but it is the easiest to analyze:

def find_word_location(grid, word):
    # Fill in suffix table.
    suffix_table = [[[False] * len(word) for _ in range(len(grid[0]))] for _ in range(len(grid))]

    for word_index in range(len(word) - 1, -1, -1):
        for row in range(len(grid) - 1, -1, -1):
            for col in range(len(grid[0]) - 1, -1, -1):
                # Base case: find coordinates in grid that match the last character in the string.
                if word_index == len(word) - 1 and grid[row][col] == word[word_index]:
                    suffix_table[row][col][word_index] = True
                # Recursive case: find coordinates in grid such that:
                # 1. the grid element given by the coordinates matches the given index in the string, and
                # 2. the grid element immediately to the right or immediately below can produce the remaining suffix.
                elif word_index != len(word) - 1 and grid[row][col] == word[word_index] and \
                    ((row + 1 < len(grid) and suffix_table[row + 1][col][word_index + 1]) or \
                    (col + 1 < len(grid[0]) and suffix_table[row][col + 1][word_index + 1])):
                    suffix_table[row][col][word_index] = True

    # Read off answer into a list.
    for row in range(len(grid)):
        for col in range(len(grid[0])):
            if suffix_table[row][col][0]:
                indices = []
                word_index = 0
                row_to_add, col_to_add = row, col
                while word_index < len(word):
                    indices.append((row_to_add, col_to_add))
                    if word_index != len(word) - 1:
                        if row_to_add + 1 < len(grid) and suffix_table[row_to_add + 1][col_to_add][word_index + 1]:
                            row_to_add += 1
                        else:
                            col_to_add += 1
                    word_index += 1

                return indices

    return None

The time analysis is now much more straightforward: there are O(r * c * w) subproblems, each of which takes O(1) time to solve. Reading the answer into a list takes O(w) time. So, the entire algorithm runs in O(r * c * w) time.

I love this approach! If you added an early stopping mechanism (as you can know that the search space remaining can't contain the suffix you are looking for), would you also *edit*(have to) account for this early stopping in the complexity analysis? — Tytrox, Feb 02 '22 at 23:16
I'm not sure how much time it'd shave off. It does reduce the number of subproblems you have to evaluate, but it isn't immediately clear to me whether there'd be any changes asymptotically. — BrokenBenchmark, Feb 02 '22 at 23:37

Tytrox · Answer 2 · 2022-02-03T00:51:00.150

I gave this problem a go. Here is my attempt. One thing that I have noticed however, is your example of word5, grid1 does not seem to be correct. (4,5) refers to 'o' not 'a' as suggested. My implementation returns False in this case:

grid1 = [
    ['c', 'c', 'x', 't', 'i', 'b'],
    ['c', 'c', 'a', 't', 'n', 'i'],
    ['a', 'c', 'n', 'n', 't', 't'],
    ['t', 'c', 's', 'i', 'p', 't'],
    ['a', 'o', 'o', 'o', 'a', 'o'],
    ['o', 'a', 'a', 'a', 'o', 'o'],
    ['k', 'a', 'i', 'c', 'k', 'i']
]
word1 = 'catnip'
word2 = 'cccc'
word3 = 's'
word4 = 'bit'
word5 = 'aoi'
word6 = 'ki'
word7 = 'aaa'
word8 = 'ooo'

grid2 = [
    ['a']
]
word9 = 'a'

co_ords = []


def find_word(word, grid, x, y, explore):
    if len(word) == 0:
        return True

    x_size = len(grid[0])
    y_size = len(grid)

    if len(word) > (x_size - x) + (y_size - y):
        return False

    try:
        char_on = grid[y][x]

        if word[0] == char_on:
            co_ords.append((y, x))
            if find_word(word[1:], grid, x + 1, y, False):
                return True
            elif find_word(word[1:], grid, x, y + 1, False):
                return True
            else:
                co_ords.pop()
        if len(word) - 1 > (x_size - x) + (y_size - y):
            return False
        if explore and find_word(word, grid, x + 1, y, True):
            return True
        else:
            return explore and find_word(word, grid, x, y + 1, True)
    except IndexError:
        return False


if __name__ == '__main__':
    co_ords = []
    print(find_word(word9, grid2, 0, 0, True))
    print(co_ords)

In terms of complexity analysis, assuming worst case, my implementation will have w((r x c) - w) recursive calls, where:

r = number of rows
c = number of columns
w = length of word

I'm not sure about the complexity of a slice operation, but assuming that it's linear worst case time complexity, the overall complexity is O(w^2((r x c) - w)).

If so, you could improve efficiency by not taking the slice, and just passing a reference to the correct location in the word, plus the appropriate adjustments when calculating len(word). The worst case time complexity would then be proportional to w((r x c) - w) = wrc - w^2.

EDIT:

The above analysis is wrong - see here for clarification on the correct answer.

There's `O(r * c - w * c - w * r + w * w) --> O(r * c)` potential starting points -- are you sure your runtime analysis is correct? — BrokenBenchmark, Feb 02 '22 at 22:28
Yep, you were right - I was being stupid! I think I have corrected this now. — Tytrox, Feb 02 '22 at 22:33
I'm still not convinced that your time analysis is quite right -- how are you able to examine whether a starting point is suitable using less than `O(w)` recursive calls? — BrokenBenchmark, Feb 02 '22 at 22:44
OK, I see now. My analysis is still wrong, and my implementation is not optimal. I think the correct check for a valid starting point should be proportional to **w**, and this should be called **rc - w** times, resulting in `O(wrc - w^2)`. I'll work on updating my answer to reflect this. — Tytrox, Feb 02 '22 at 22:54
grid1[4][5] is supposed to be an 'a', the word would always be in the grid. Thank you for noticing. — Amundeep Singh Dhaliwal, Feb 05 '22 at 16:27

Shlomo Gottlieb · Answer 3 · 2022-02-02T23:14:42.580

Here is a possible solution:

def find_coords(grid, row, col, word, coords):
    # word is fully matched
    if len(word) == 0:
        return True

    height = len(grid)
    width = len(grid[0])

    # grid is exhausted
    if (not (height or width) or
        row >= height or col >= width):
        return False

    if grid[row][col] == word[0]:
        coords.append((row, col))

        # lookup right and down sub-grids
        if (find_coords(grid, row, col+1, word[1:], coords) or
            find_coords(grid, row+1, col, word[1:], coords)):
            return True
        else:
            # cleanup current coordinate
            coords.pop()

    # no match
    return False


def find_word(grid, word, coords):
    # try every coordinate, stop on first match
    return any(find_coords(grid, row, col, word, coords)
               for row in range(len(grid))
               for col in range(len(grid[row])))

then you can call it like this:

coords = []
find_word(grid1, word1, coords)
print(coords)

Richard K Yu · Answer 4 · 2022-02-02T23:13:53.363

There are some good answers already, but I would like to share my approach:

grid1 = [
    ['c','c','x','t','i','b'], 
    ['c','c','a','t','n','i'],
    ['a','c','n','n','t','t'],
    ['t','c','s','i','p','t'],
    ['a','o','o','o','a','o'],
    ['o','a','a','a','o','o'],
    ['k','a','i','c','k','i']
]

coordinates_saver = []

def recursive_search_helper(grid, word, r, c, coordinates, directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]):
        if len(coordinates) == len(word):
            if coordinates not in coordinates_saver:
                coordinates_saver.append(coordinates)
            return coordinates
        
        if grid[r][c] == word[len(coordinates)]:
            coordinates.append((r,c))
            for direction in directions:
                #Check if direction is a valid choice:
                #print(r+direction[0], c+direction[1])
                if(r+direction[0]>=0 and r+direction[0] < len(grid) and c+direction[1] >= 0 and c+direction[1]<len(grid[0])):
                    #We cannot go back.
                    choices = [(-1, 0), (1, 0), (0, -1), (0, 1)]
                    choices.remove((-1*direction[0],-1*direction[1]))
                    recursive_search_helper(grid, word, r+direction[0], c+direction[1], coordinates, choices)
        
        #The letter at r,c doesn't match.
        else:
            return
                

def recursive_search(grid, word, coordinates):
    for i in range(len(grid)):
        for j in range(len(grid[i])):
            #print(f"On {i,j}")
            recursive_search_helper(grid, word, i, j, coordinates)
            coordinates =[]

recursive_search(grid1, "catnip", [])
print(coordinates_saver)

coordinates_saver = []
recursive_search(grid1, "cccc", [])
print(coordinates_saver)

coordinates_saver = []
recursive_search(grid1, "s", [])
print(coordinates_saver)

coordinates_saver = []
recursive_search(grid1, "bit", [])
print(coordinates_saver)

The function outputs all the possible ways to get a word as a list. I believe the grid and the image and grid given as text are different as well. Not sure if this is a typo.

Output:

[[(1, 1), (1, 2), (1, 3), (2, 3), (3, 3), (3, 4)]]
[[(0, 0), (1, 0), (1, 1), (0, 1)], [(0, 1), (1, 1), (2, 1), (3, 1)], [(1, 0), (0, 0), (0, 1), (1, 1)], [(1, 1), (0, 1), (0, 0), (1, 0)], [(2, 1), (1, 1), (0, 1), (0, 0)], [(3, 1), (2, 1), (1, 1), (0, 1)]]
[[(3, 2)]]
[[(0, 5), (1, 5), (2, 5)]]

Note: I had some issues with returning coordinates. Somehow, when my code enters the first if statement, it does not return the end the method immediately, so I had to have this outside list coordinates_saver to retrieve the results. If anyone can tell me why this return statement does not produce an object even when len(coordinates) == len(word) it will help me.

Recursion backtracking interview question

4 Answers4

Linked