Disjoint Sets path compression running time error

Question

I am taking an online data structure course now. Here is my Python implementation of a merge and find algorithm. I followed the instructions, but the running time far exceeds the limits. Can anyone take a look? It should be a simple one. Thanks.

We must do 'm' merges or union operations. () and () are the numbers of two tables with real data. If () ̸=(), copy all the rows from table () to table (), then clear the table () and instead of real data put a symbolic link to () into it. And the answer is the maximum table size of the list lines. An example of order of operations:

The input shows there are 5 tables, and we want to do 5 operations. Each of the tables has size 1. The following five lines show that we want to merge source5 to destination3, source4 to destination2... The output should be:

Explanation: In this sample, all the tables initially have exactly 1 row of data. Consider the merging operations:

All the data from the table 5 is copied to table number 3. Table 5 now contains only a symbolic link to table 3, while table 3 has 2 rows. 2 becomes the new maximum size.
2 and 4 are merged in the same way as 3 and 5.
We are trying to merge 1 and 4, but 4 has a symbolic link pointing to 2, so we actually copy all the data from the table number 2 to the table number 1, clear the table number 2 and put a symbolic link to the table number 1 in it. Table 1 now has 3 rows of data, and 3 becomes the new maximum size.
Traversing the path of symbolic links from 4 we have 4 → 2 → 1, and the path from 5 is 5 → 3. So we are actually merging tables 3 and 1. We copy all the rows from the table number 1 into the table number 3, and now the table number 3 has 5 rows of data, which is the new maximum.
All tables now directly or indirectly point to table 3, so all other merges won’t change anything.

Instruction: Think how to use disjoint set union with path compression and union by rank heuristics to solve this problem. In particular, you should separate in your thinking the data structure that performs union/find operations from the merges of tables. If you’re asked to merge the first table into second, but the rank of the second table is smaller than the rank of the first table, you can ignore the requested order while merging in the Disjoint Set Union data structure and join the node corresponding to the second table to the node corresponding to the first table instead in you Disjoint Set Union. However, you will need to store the number of the actual second table to which you were requested to merge the first table in the parent node of the corresponding Disjoint Set, and you will need an additional field in the nodes of Disjoint Set Union to store it.

Here is my code to implement it using rank heuristic and path compression:

# python2
import sys

n, m = map(int, sys.stdin.readline().split())
lines = list(map(int, sys.stdin.readline().split()))
rank = [1] * n
rank_original=[1]*n
parent = list(range(0, n))
ans = max(lines)

rank=lines

for i in range(len(lines)):
    rank[i]=lines[i]
    rank_original[i]=lines[i]


def getParent(i):
    # find parent and compress path
    if i!=parent[i]:
        parent[i]=getParent(parent[i])
    return parent[i]

def merge(destination, source):
    realDestination, realSource = getParent(destination), getParent(source)

    if realDestination == realSource:
        return False
    if rank[realDestination]>=rank[realSource]:
        parent[realSource]=realDestination
        rank[realDestination] += rank[realSource]

        rank_original[realDestination]=rank[realDestination]

    else:
        parent[realDestination]=realSource
        rank[realSource]+=rank[realDestination]
        rank_original[realDestination]=rank[realSource]

    rank_original[source]=0

    return True

for i in range(m):
    destination, source = map(int, sys.stdin.readline().split())
    merge(destination - 1, source - 1)
    ans=max(rank)
    print(ans)

niemmi · Accepted Answer · 2016-09-02T04:30:13.090

0

The problem is that you're calling max on the whole data on each round thus having O(nm) time complexity. Instead of doing that call max on initial data, store the result and after each merge update it in case destination table is larger than current max. With path compression this will result to O(m + n) time complexity.

n, m = map(int, raw_input().split())
rank = [0] + map(int, raw_input().split())
parent = range(n + 1)
current_max = max(rank)

def find_parent(x):
    if parent[x] != x:
        parent[x] = find_parent(parent[x])
    return parent[x]

for source, dest in (map(int, raw_input().split()) for _ in xrange(m)):
    source, dest = find_parent(source), find_parent(dest)
    if source != dest:
        if rank[source] > rank[dest]:
            source, dest = dest, source
        parent[source] = dest
        rank[dest] += rank[source]

    current_max = max(current_max, rank[dest])  
    print current_max

edited Sep 02 '16 at 04:30

answered Aug 31 '16 at 07:06

niemmi

17,113
7
35
42

I used your trick, but the system still shows: Failed case #11/132: time limit exceeded (Time used: 11.99/6.00, memory used: 158879744/536870912.) – patrickkkkk Aug 31 '16 at 18:01
@patrickkkkk With the provided information it's bit difficult to figure out what might be the problem. If you could add the instructions, example input and expected output to the question you might get better result. Full size input is not needed, just a valid example. – niemmi Sep 01 '16 at 06:48
Thank you for your reply. I have edited my question. I think I used rank heuristic and path compression correctly, but why it cost that much time to run? – patrickkkkk Sep 01 '16 at 20:00
@patrickkkkk: Updated my answer based on the additional information. – niemmi Sep 02 '16 at 02:27
It really works! I finally understood what was wrong in my original code. Thank you so much niemmi! – patrickkkkk Sep 02 '16 at 03:36

Disjoint Sets path compression running time error

1 Answers1