Time complexity of this algorithm: Word Ladder

Question

Question:

Given two words (beginWord and endWord), and a dictionary's word list, find all shortest transformation sequence(s) from beginWord to endWord, such that:

Only one letter can be changed at a time. Each transformed word must exist in the word list. Note that beginWord is not a transformed word.

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]

Output: [ ["hit","hot","dot","dog","cog"], ["hit","hot","lot","log","cog"] ]

My solution is based on this idea, but how do I analyze the time and space complexity of this solution?

1) Perform a BFS starting at beginWord by transforming every letter to one of 26 letters, and see if the transformed word is in the wordList, if so, put in queue.

2) During BFS, maintain a graph of {word:nextWord} for all valid next words

3) When a nextWord reaches endWord, do a backtracking DFS (pre-order traversal) on the graph to get all paths.

class Solution:
    def findLadders(self, beginWord, endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: List[str]
        :rtype: List[List[str]]
        """
        wordListSet = set(wordList+[beginWord])
        graph = collections.defaultdict(list)
        q = set([beginWord])    
        count = 0
        result = []
        while q:
            count +=1
            newQ = set()
            for word in q:
                wordListSet.remove(word)
            for word in q:
                if word == endWord:                                        
                    self.getAllPaths(graph, beginWord, endWord, result, [])
                    return result
                for i in range(len(word)):
                    for sub in 'abcdefghijklmnopqrstuvwxyz':
                        if sub != word[i]:
                            newWord = word[:i] + sub + word[i+1:]
                            if newWord in wordListSet:
                                graph[word].append(newWord)
                                newQ.add(newWord)
            q = newQ
        return []

    def getAllPaths(self, graph, node, target, result, output):
        #This is just a backtracking pre-order traversal DFS on a DAG.
        output.append(node)
        if node==target:
            result.append(output[:])
        else:
            for child in graph[node]:
                self.getAllPaths(graph,child, target, result, output)
                output.pop()

I have a hard time coming up with the time and space complexity of it. My contention:

Time: O(26*L*N + N), where L is average length of each word, and N is the number of words in the wordList. Worst case here is every word transformed happens to be in the list, so each transformation needs 26 * length of word. The DFS part is just O(N). So asymptotically it's just O(L*N)

Space: O(N)

The time complexity i'm unsure, it seems need to be bigger than linear but I can't convince myself of so. — user1008636, Nov 03 '18 at 02:12
Graph search is linear in the size of the graph when the (potential) out-degree is fixed, as here. — Davis Herring, Nov 03 '18 at 17:55
Is my observation correct, that the dictionary property (sorted alphabetically) doesn't help in the average case because the change in leading characters makes the order worthless? — Vroomfondel, Nov 07 '18 at 09:01
This is not the question, but have you considered using the levenshtein distance? It would compute it in O(N * L**2) (Accroding to your notations) which is assymptotically worse than your solution. But better in many cases (Since L < 26) . — BlueSheepToken, Nov 08 '18 at 14:21

score 5 · Answer 1 · edited Nov 09 '18 at 17:09

5

You won't find all simple paths because there might be alternative shortest paths to the end word. The simplest counterexample is as follows:

beginWord = aa,
endWord = bb
wordList = [aa, ab, ba, bb]

Your algorithm would miss the path aa -> ba -> bb. In fact, it will always find at most one path.

The time complexity is indeed O(L * N) as you wrote but the space complexity is O(L*N) which is the space that your graph or wordList occupies.

edited Nov 09 '18 at 17:09

vencaslac

2,727
1
18
29

answered Nov 08 '18 at 16:39

Mikhail Berlinkov

1,624
10
15

I think it does get all the paths. At each word in the path, graph[] contains a list of all transistions which is then handled by annoying recursive routine. Its hard to read from the poor typing and documentation. – Charles Merriam Nov 10 '18 at 18:25
@CharlesMerriam By the time self.getAllPaths is being called graph is not built to include all transitions to a final word (as well as prior transitions). So, this recursive routine wouldn't work as it uses incomplete graph. – Mikhail Berlinkov Nov 10 '18 at 19:51
An optimal solution to this problem would be as follows. In the first step, we compute distances to all words until we reach the endWord. In the second step, starting from endWord using DFS we traverse all paths going through words with distances decreasing by 1 until we reach beginWord. – Mikhail Berlinkov Nov 10 '18 at 20:04
You are correct. There should be a flag, and then finish the `for word in q` loop before returning. – Charles Merriam Nov 11 '18 at 00:49
@CharlesMerriam Unfortunately, this flag wouldn't help because as we miss other shortest paths to previous words in a path. – Mikhail Berlinkov Nov 11 '18 at 13:46
I respectfully disagree. I linked to the corrected code to my answer below as an existence. I will agree that this is an expired equine. – Charles Merriam Nov 13 '18 at 09:06
@CharlesMerriam The code you provided a link seems to be correct. Although, I thought you meant something different and I haven't found any flag in there that you wrote about. In this variant, however, the algorithm will continue to build a graph even after finding all shortest paths, although it's not negligible. – Mikhail Berlinkov Nov 13 '18 at 15:29
My issue with language. Yes, its pretty bad code to start. Yes, the loop should break once it completes an iteration for that finds at least one link. – Charles Merriam Nov 14 '18 at 15:43
The time-complexity is `O(L^2 * N)`. The line `if newWord in wordListSet` executes in `O(L)`, not constant time. – Nov 21 '20 at 17:13

score 1 · Answer 2 · answered Jan 09 '21 at 22:40

1

the answer should be O(L^2 * n)

In the process of building a new word, it costs O(L^2) in total. Firstly we loop the current word, that costs O(L); then for building each new string: newWord = word[:i] + sub + word[i+1:], this cost another O(L)

answered Jan 09 '21 at 22:40

Zhenyi Lin

111
1
5

Charles Merriam · Answer 3 · 2018-11-13T09:04:14.197

This sounds like a fun problem. Yes, the answer is O(L * N). If you fixed your code to return all solutions, the recursive print routine is O(L!).

You have have this outer loop, for all nodes being considered. This can be equal to the length of your wordlist. Consider the fully connected set of three letter combinations ['aaa', 'aab', ... 'zzz']. The node count is 26^3, or 27576. Transforming from aaa to zzz has six answers: aaa->zaa->zza->zzz, aaa->zaa->aza->zzz, aaa->aza->zza->zzz, etc. You would be considering all length three paths, (25+25+25)(25+25)(25) or 93,750 paths to be sure there wasn't a shorter path.
You have two choices for the inner loop: for i in range(len(word)) and your recursive call to get_all_paths() to list all the paths. You know you have an order of length_of_word for the inner, implying O(L * N). Note that O(L * N * 26) means the same thing; big O notation only cares about the scale of changes. I haven't proved you stay linear on that get_all_paths loop.
This is a special case of Dijkstra's Shortest Path. You can do better adding a heuristic to your specific problem. The total path length through a node is always greater than or equal to the distance so far plus the number of letters still wrong. That means, in the fully connected case, you have aaa (0 length)->aab (1)->abb (2)->bbb (3) so you avoid exploring aaa (0 actual + 3 heuristic) -> aab (1 actual + 3 heuristic).
You can correct your code to return all the word ladders, and I did so here. The problem is that the recursive getAllPaths() routine now grows faster then O(L * N). In the code sample, an input has two sets of "path choices", or subgraphs, set of which multiplies the number of paths. So, tripling the number of nodes would triple the number of path choices, cubing the number of path choices.

Time complexity of this algorithm: Word Ladder

3 Answers3

Linked