interview prep: optimizing swapLexOrder

Question

Interview hash map question on code fights, need help optimizing my brute force solution. Here is the problem:

Given a string str and array of pairs that indicates which indices in the string can be swapped, return the lexicographically largest string that results from doing the allowed swaps. You can swap indices any number of times.

Example

For str = "abdc" and pairs = [[1, 4], [3, 4]], the output should be
swapLexOrder(str, pairs) = "dbca".

By swapping the given indices, you get the strings: "cbda", "cbad", "dbac", "dbca". The lexicographically largest string in this list is "dbca".

My current solution

Brute force by continually adding all possibilities until the there are no new solutions. This is too slow for swapLexOrder('dznsxamwoj',[[1,2],[3,4],[6,5],[8,10]]), doesn't finish on my machine. Any hints for optimizing? An easier test case that passes is swapLexOrder('abdc,[[1,4],[3,4]])= dbca

def swapLexOrder(str, pairs):
    d = {}
    d[str]=True
    while True:
        oldlen=len(d)
        for x,y in pairs:
            for s in d.keys():
                d[swp(s,x,y)]=True
        if len(d) == oldlen:
            #no more new combinations.
            return sorted(d)[-1]

def swp(str,x,y):
    x=x-1
    y=y-1
    a=str[x]
    b=str[y]
    return str[0:x]+b+str[x+1:y]+a+str[y+1:]

Izaak van Dongen · Accepted Answer · 2017-08-27T10:20:24.773

My proposed solution would be to first try to 'link' as many pairs as possible to form sets of indices which can be interchanged - eg in your first example, [[1, 4], [3, 4]] can become [[1, 3, 4]]. Each of these subsets of indices can then be lexicographically sorted to form the output. The implementation comes to this:

def build_permitted_subs(pairs):
    perm = []

    for a, b in pairs:
        merged = False
        for ind, sub_perm in enumerate(perm):
            if a in sub_perm or b in sub_perm:
                sub_perm.add(a)
                sub_perm.add(b)
                merged = True
                break

        else:
            perm.append(set([a, b]))

        if merged:
            for merge_perm_ind in reversed(range(ind + 1, len(perm))):
                if perm[merge_perm_ind] & sub_perm:
                    sub_perm.update(perm[merge_perm_ind])
                    perm.pop(merge_perm_ind)

    return list(map(sorted, perm))

def swap_lex_order(swap_str, _pairs):

    pairs = [[a - 1, b - 1] for a, b in _pairs]
    out = list(swap_str)

    perm = build_permitted_subs(pairs)

    for sub_perm in perm:
        sorted_subset = sorted(sub_perm, key=lambda ind: swap_str[ind], reverse=True)

        for sort, targ in zip(sorted_subset, sub_perm):
            out[targ] = swap_str[sort]

    return "".join(out)

print(swap_lex_order("dznsxamwoj", [[1, 2], [3, 4], [6, 5], [8, 10]]))
print(swap_lex_order("abdc", [[1, 4], [3, 4]]))
print(swap_lex_order("acxrabdz",[[1,3], [6,8], [3,8], [2,7]]))

with output:

zdsnxamwoj
dbca
zdxrabca

I've also renamed your parameters not to use str, which is already a pretty fundamental Python builtin. Note that my code may not be as Pythonic as possible, but I think it works well enough to illustrate the algorithm, and it's not suffering from any major performance hits. I suspect this approach has a pretty low complexity - it's generally 'intelligent' in that it doesn't brute force anything, and uses O(n log n) sorts. The first example seems to be right. Note that this transforms each pair to be 0-based as this is much easier for Python.

This relies a little on being able to form any permutation (sorting the linked pairs) from adjacent permutations (swapping pairs). This may not be entirely intuitive, but it might help to realise you can effectively perform insertion using only adjacent swaps in a list (by continually swapping an element in the direction for it to go). An example of permuting a list using adjacent swaps is bubble sort, and you might realise that if any permutation can be bubblesorted, that means all permutations can be reached by bubblesort.

If you have any questions, or anything doesn't work, let me know and I'll start elaborating/debugging. (As of 19:28 GMT I've already noticed one bug and edited it out : ). Bug #2 (with the duplicated z at test case 3) should also be fixed.

A little more on bug #1:

I hadn't sorted the indices returned by build_permitted_subs, so it couldn't sort them properly with reference to swap_str.

More on bug #2:

The build_permitted_subs function wasn't working properly - specifically, if it met a pair that could go into two groups, meaning those groups should also join together, this didn't happen, and there would now be two groups that shouldn't be separate. This leads to z duplication as both groups can draw from the z. I've sloppily fixed this with a flag and a retroactive for loop.

Haha, sorry about using "str" copied over from the question. I think it's pretty pythonic, unfortunately failed one of the test cases. For `swap_lex_order("acxrabdz",[[1,3], [6,8], [3,8], [2,7]])= zdxrazcb` when the actual answer is `zdxrabca`. The code is adding an extra letter z somehow, it's a little late here but I'll try to find the bug tomorrow. I'm suspecting looping through each sub_perm is messing it up somehow similar to bug1. Thanks for answering, learned some cool tricks — bhuj2000, Aug 27 '17 at 04:05
Darn, sorry about that. It should now be fixed, although I haven't tested much, I'll look into it more after work — Izaak van Dongen, Aug 27 '17 at 10:20

score 0 · Answer 2 · answered Sep 26 '17 at 02:09

This one perhaps works better.

def swapLexOrder(str_, pairs):
n = len(str_)
str_ = list(str_)

corr = [set() for _ in range(n)]
nodes = set()
for a, b in pairs:
    corr[a-1].add(b-1)
    corr[b-1].add(a-1)
    nodes.add(a-1)
    nodes.add(b-1)

while nodes:
    active = {nodes.pop()}
    group = set()
    while active:
        group |= active
        nodes -= active
        active = {y for x in active for y in corr[x] if y in nodes}

    chars = iter(sorted((str_[i] for i in group), reverse=True))
    for i in sorted(group):
        str_[i] = next(chars)

return "".join(str_)

score 0 · Answer 3 · answered Apr 27 '21 at 02:19

def swapLexOrder(str, pairs):
    
    if not str or not pairs:
        return ('', str)[not pairs]
    lst = [''] + list(str)
    setted_pairs = list(map(set, pairs))
    while setted_pairs:
        path = setted_pairs.pop(0)
        while True:
            path1 = path.copy()
            for pair in setted_pairs:
                if path1 & pair:
                    path |= pair
                    setted_pairs.remove(pair)
            if path == path1:
                break
        optimal = sorted(lst[i] for i in path)
        for i, v in enumerate(sorted(path, reverse=True)):
            lst[v] = optimal[i]
    return ''.join(lst[1:])

score 0 · Answer 4 · answered Oct 13 '21 at 13:14

My preferred solution is using a disjoint set to solve this problem. The key idea is to build a connected graph of the pairs kinda like a linked list. This represents substrings that are connected by the pairs. Once you figured what is connected, you can sort the substrings then pick out the most lexicographic character out of the substring when building the string.

The disjoint set helps a lot here because it lets us figure out what's connected in an extremely fast matter. It's actually faster than log, it's log*. I recommend reading the Wikipedia page for an explanation. By using the union function, we can build the "linked list" from the given pairs.

import collections

class DisjointSet:
    def __init__(self, string, pairs):
        self.parent = [i for i in range(len(string))]
        self.size = [1] * len(string)
        
        for a, b in pairs:
            self.union(a-1, b-1)
    
    def find_parent(self, idx):
        # O(log*(n))
        if self.parent[idx] == idx:
            return idx
        self.parent[idx] = self.find_parent(self.parent[idx])
        return self.parent[idx]
    
    def union(self, a, b):
        # O(log*(n))
        x = self.find_parent(a)
        y = self.find_parent(b)
        
        if x == y:
            return
        
        if self.size[x] < self.size[y]:
            x, y = y, x
        
        self.parent[y] = x
        self.size[x] = self.size[x] + self.size[y]

def swapLexOrder(string, pairs):
    # O(nlogn) + O(nlog*(n))
    string = list(string)
    # Build the disjoint set to figure out what pairs are connected
    disjoint = DisjointSet(string, pairs)
    graph = collections.defaultdict(list)
    
    # With the disjoint set, build the substrings connected by the pairs
    for i, c in enumerate(string):
        graph[disjoint.find_parent(i)].append(c)
    
    # Sort the substrings
    for i in range(len(string)):
        graph[i].sort()
    
    # Build the answer by picking the most lexicographic out of the substrings
    for i in range(len(string)):
        parent = disjoint.find_parent(i)
        string[i] = graph[parent][-1]
        graph[parent].pop()
    
    return "".join(string

I found the idea here. I just implemented it in Python and added comments.

interview prep: optimizing swapLexOrder

4 Answers4