0

Sea EDIT!! Below

I am coding a word ladder algorithm. The user enters a start word, an end word and a hash of all the words. The algorithm returns all the shortest paths (multiple if exist) from start word to the end word. Eg -> start_word = 'cold' , end_word = 'warm'

output = [[ cold -> cord-> card-> ward-> warm], [/If another path exists/]].

Every consecutive word from the previous is different by one character. I am using BFS search to solve this problem. My strategy was to return all the paths, and then select the shortest ones from the returned list. This is my code to return all the paths:

auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {


    absl::flat_hash_set<std::string> visited = {};
    std::queue<std::vector<std::string>> search_queue = {};
    std::vector<std::vector<std::string>> paths = {};

    search_queue.push(std::vector<std::string>{from});

    while (!search_queue.empty()) {
    auto word = search_queue.front();
    search_queue.pop();

    auto neighbours = generate_neighbours(word.back(), lexicon);
    for (auto &i: neighbours) {

        auto new_word = word;
        new_word.push_back(i);
        if (i == to) {
            paths.push_back(new_word);
            continue;
        }

        if (visited.find(i) != visited.end()) {
            continue;
        }

        search_queue.push(new_word);
        visited.insert(i);

    }
}

    return paths;
}

It does return multiple paths however the problem is that it doesnt return all the paths. One of the paths it returns is ->

1) awake, aware, sware, share, shire, shirr, shier, sheer, sheep, sleep

however it doesn't return the path -> 2) "awake","aware","sware","share","sharn","shawn","shewn","sheen","sheep","sleep"

I am pretty sure the reason is because the way I have coded it, it marks the word "share" as visited the first time it encounters it (in 1) ). Hence it doesn't go through the second path (in 2))

To solve this, I changed my for loop a bit:

    for (auto &i: neighbours) {

            auto new_word = word;
            new_word.push_back(i);
            if (i == to) {
                paths.push_back(new_word);
                continue;
            }

            for (auto &j: word) {
                if (j == i) {
                    continue;
                }
            }

            search_queue.push(new_word);

        }

The idea was to check if the word has been visited in the path that you are keeping track of in the queue, and not globally. However, this solution for some reason gets stuck in a loop somewhere and doesn't terminate (I am assuming due to large dataset?).

Is there something wrong with my code in the second or it takes too long because of large dataset? How can I better achieve the solution?

EDIT!!!

I am now instead of finding all the paths, finding the length of shortest path and then performing BFS till that depth to get all the paths at that depth.

auto word_ladder::generate(std::string const& from, std::string const& to, absl::flat_hash_set<std::string> const& lexicon) -> std::vector<std::vector<std::string>> {


    absl::flat_hash_set<std::string> visited = {};
    visited.insert(from);

    std::queue<std::vector<std::string>> search_queue = {};
    std::vector<std::vector<std::string>> paths = {};

    search_queue.push(std::vector<std::string>{from});

    auto length = find_shortest_path_length(from, to, lexicon);
    std::cout << "length is: " << length << "\n";
    // auto level = 0;

    std::unordered_map<std::string, int> level_track = {};
    level_track[from] = 0;

    while (!search_queue.empty() ) {
        auto word = search_queue.front();
        search_queue.pop();

        // **
        if (level_track[word.back()] <= length) {
            auto neighbours = generate_neighbours(word.back(), lexicon);
            const auto &parent = word.back();
            for (auto &i: neighbours) {

                auto new_word = word;
                new_word.push_back(i);
                if (i == to) {
                    paths.push_back(new_word);
                    std::cout << "The level at the path was " << level_track[parent] << "\n";
                    continue;
                }

                if (path_crossed(word, i)) {
                    continue;
                }


                search_queue.push(new_word);
                level_track[i] = level_track[parent] + 1;

            }
        }
    }
    return paths;
}

The solution now terminates so definitely the problem earlier was the large number of searches. However my algorithm is still not giving me correct answer as the way I keep track of depth of my nodes (words) is somehow not correct.

ps1234
  • 161
  • 2
  • 10
  • I dont understand the part about the shortest path. If you replace one character in each step then the length of the paths is always the same: number of characters different in input and output, no? – 463035818_is_not_an_ai Jun 18 '20 at 10:13
  • I am not sure what you mean. But what i mean by the shortest path is the number of words you need to go through to get to the end word. In the example on top, to get from the word 'cold' to 'warm', the output takes 4 steps. You may have many other ways you can get to the end word. Another path may be where you have to go through say 6 words to get to end word. – ps1234 Jun 18 '20 at 10:24
  • your code and problem description is incomplete. In the meantime I realized that there is a dictionary of allowed words. You should provide a [mcve], but anyhow for code reviews of working code there is https://codereview.stackexchange.com/ – 463035818_is_not_an_ai Jun 18 '20 at 10:46
  • you can do some preparation of the dictionary. If you construct a `std::map>` that maps all words from the dictionary to all words that can be reached in one step the look up will be much more efficient. And finding all paths is then only a matter of traversing that map – 463035818_is_not_an_ai Jun 18 '20 at 10:48

1 Answers1

1

You're trying to find an efficient solution but most probably it doesn't exist. See this answer. Enumerating all shortest paths can be very costly.

OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87