Union-Find leetcode question exceeding time limit

Question

I am solving this problem on leetcode https://leetcode.com/problems/sentence-similarity-ii/description/ that involves implementing the union-find algorithm to find out if two sentences are similar or not given a list of pairs representing similar words. I implemented ranked union-find where I keep track of the size of each subset and join the smaller subtree to the bigger one but for some reason the code is still exceeding the time limit. Can someone point me to what I am doing wrong? How can it be optimized further. I saw other accepted solutions were using the same ranked union find algorithm.

Here is the code:

    string root(map<string, string> dict, string element) {
    if(dict[element] == element)
        return element;
    return root(dict, dict[element]);
}
bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
    if(words1.size() != words2.size()) return false;
    std::map<string, string> dict;
    std::map<string, int> sizes;
    for(auto pair: pairs) {
        if(dict.find(pair.first) == dict.end()) {
            dict[pair.first] = pair.first;
            sizes[pair.first] = 1;
        }
        if(dict.find(pair.second) == dict.end()) {
            dict[pair.second] = pair.second;
            sizes[pair.second] = 1;
        }

        auto firstRoot = root(dict, pair.first);
        auto secondRoot = root(dict, pair.second);
        if(sizes[firstRoot] < sizes[secondRoot]) {
            dict[firstRoot] = secondRoot;
            sizes[firstRoot] += sizes[secondRoot];
        }
        else {
            dict[secondRoot] = firstRoot;
            sizes[secondRoot] += sizes[firstRoot];
        }
    }


    for(int i = 0; i < words1.size(); i++) {
        if(words1[i] == words2[i]) {
            continue;  
        }
        else if(root(dict, words1[i]) != root(dict, words2[i])) {
            return false;
        }
    }
    return true;
}

Thanks!

Well if `dict` is large then passing it by value (to the `root` function) might not be such a good idea. Passing it as a *`const`* reference could help. — Some programmer dude, Sep 05 '18 at 07:16
Damn that was it, I can't believe myself. Thank you so much. — Kareem Aboughazala, Sep 05 '18 at 07:18

score 0 · Answer 1 · answered Sep 05 '18 at 13:31

Your union-find is broken with respect to complexity. Please read Wikipedia: Disjoint-set data structure.

For union-find to have its near O(1) complexity, it has to employ path-compaction. For that, your root method has to:

Get dict by reference, so that it can modify it.
Make path compaction to all elements on the path, so that they point to the root.

Without compaction you will have O(log N) complexity for root(), which could be OK. But for that, you'd have to fix it so that root() gets dict by reference and not by value. Passing dict by value costs O(N).

The fact that dict is an std::map makes any query cost O(log N), instead of O(1). std::unordered_map costs O(1), but in practice for N < 1000, std::map is faster. Also, even if std::unordered_map is used, hashing a string costs O(len(str)).

If the data is big, and performance is still slow, you may gain from working with indexes into pairs instead of strings, and run union-find with indexes into a vector<int>. This is error prone, since you have to correctly deal with duplicate strings.

Union-Find leetcode question exceeding time limit

1 Answers1