6

I working on union finding. I want to group pairs of numbers based on whether one of the indices shares a number with an index of another pair. So:

I have an array of pairs such as these:

pairs: [[1,3], [6,8], [3,8], [2,7]]

whats the best way to group them in unions such as this:

[ [ 1, 3, 8, 6 ], [ 2, 7 ] ]

([1,3] and [3,8] go together because they share 3. That group unites with [6,8] because they share 8. Whats the best way to do this in javascript?

Here are other examples:

pairs: [[8,5], [10,8], [4,18], [20,12], [5,2], [17,2], [13,25],[29,12], [22,2], [17,11]]

into [ [ 8, 5, 10, 2, 17, 22, 11 ],[ 4, 18 ],[ 20, 12, 29 ],[ 13, 25 ] ]

Edit Here's the method I'm currently using:

findUnions = function(pairs, unions){
   if (!unions){
       unions = [pairs[0]];
       pairs.shift();
   }else{
       if(pairs.length){
           unions.push(pairs[0])
           pairs.shift()
       }
   }

    if (!pairs.length){
        return unions
    }
    unite = true
    while (unite && pairs.length){
        unite = false
        loop1:
        for (i in unions){
            loop2:
            var length = pairs.length;
            for (j=0;j<length;j++){
                if (unions[i].includes(pairs[j][0])){
                    if (!unions[i].includes(pairs[j][1])){
                        unions[i].push(pairs[j][1])
                        pairs.splice(j, 1)
                        j-=1;
                        length-=1
                        unite = true
                    }else{
                        pairs.splice(j, 1)
                        j-=1
                        length-=1
                    }
                }else if (unions[i].includes(pairs[j][1])){
                     unions[i].push(pairs[j][0])
                     pairs.splice(j, 1)
                     unite = true
                    j-=1
                    length-=1
                }
            }
        }
    }
    return findUnions(pairs, unions)
}
Stephen Agwu
  • 1,013
  • 2
  • 15
  • 29
  • Why is `8` before `6` at `[1, 3, 8, 6 ]`? Is specific order no a requirement? – guest271314 Jul 25 '17 at 23:38
  • I'd call this "clique finding" rather than "union finding" – 000 Jul 25 '17 at 23:42
  • no order requirement. 8 is before 6 because my current algorithmn adds (3,8) first but thats not important – Stephen Agwu Jul 25 '17 at 23:43
  • Joe Frambach, I'll edit the post to say that. I'm self taught so I'm not to good at specific terms – Stephen Agwu Jul 25 '17 at 23:44
  • @JoeFrambach What is a "clique" in computer programming? – guest271314 Jul 25 '17 at 23:46
  • 1
    Similar to this https://en.wikipedia.org/wiki/Clique_(graph_theory) where a pair [x, y] can be thought of as an edge from node x to node y. The output is a set of cliques. – 000 Jul 25 '17 at 23:47
  • @JoeFrambach Interesting. The same term can have different meanings depending on the context. – guest271314 Jul 25 '17 at 23:48
  • Sorry, "clique" is incorrect. It means that all nodes are connected to all other nodes in the clique, which is not what you're looking for. You're looking for "graph partitioning" algorithms or something like that. I'll tag this question accordingly. – 000 Jul 25 '17 at 23:51
  • @JoeFrambach I believe ["forest"](https://en.wikipedia.org/wiki/Tree_(graph_theory)#Forest) is the term you were looking for. And "disjoint union" or "union find" is the type of algorithm he's looking for. And a dfs forest is the simplest of those in my opinion. Though wikipedia gives an interesting one [here](https://en.wikipedia.org/wiki/Disjoint-set_data_structure). – bowheart Jul 26 '17 at 16:59

3 Answers3

3

Method:

finalArray = [], positions = {};    
for i to Array.length
   for j=i+1 to Array.length
       find match between arr[i] and arr[j]
       if match found
          pos = postion mapped to either i or j in positions
          add elements of arr[i] or arr[j] or both depending on pos.
return finalArray

In the method we keep storing positions of arrays we are adding to finalArray in positions object and later we can use this object to find a suitable position to add elements of matched arrays in finalArray.

function mergeArrays(finalArray, pos, subArray) {
for (var k = 0; k < subArray.length; k++) {
    if (finalArray[pos].indexOf(subArray[k]) < 0)
        finalArray[pos].push(subArray[k]);
}

}

function unionArrays(arr) {
var finalArray = [arr[0]],
    positions = {
        0: 0
    };
for (var i = 0; i < arr.length; i++) {
    for (var j = i + 1; j < arr.length; j++) {
        for (var k = 0; k < arr[i].length; k++) {
            if (arr[j].indexOf(arr[i][k]) >= 0) {
                if (i in positions) {
                    mergeArrays(finalArray, positions[i], arr[j]);
                    positions[j] = positions[i];
                } else if (j in positions) {
                    mergeArrays(finalArray, positions[j], arr[i]);
                    positions[i] = positions[j];
                } else {
                    var pos = finalArray.length;
                    finalArray.push([]);
                    mergeArrays(finalArray, pos, arr[i]);
                    mergeArrays(finalArray, pos, arr[j]);
                    positions[i] = positions[j] = pos;
                }
                break;
            }

        }
    }
    if (!(i in positions)) {
        finalArray.push(arr[i]);
        positions[i] = finalArray.length - 1;
    }
}
return finalArray;
}
console.log(unionArrays([[1,3], [6,8], [3,8], [2,7]]));
console.log(unionArrays([[8,5], [10,8], [4,18], [20,12], [5,2], [17,2], [13,25],[29,12], [22,2], [17,11]]));
Dij
  • 9,761
  • 4
  • 18
  • 35
  • nice, much cleaner than the method I came up with. I'll see if I can improve on this – Stephen Agwu Jul 26 '17 at 01:18
  • hmmm, it seems the you algorithm while cleaner might be slower. I'm using this algorithm as part of a larger funciton and when I replace mine with yours, the larger function times out (i'm limited to 4000ms). I'll edit the original post with my method so we can compare – Stephen Agwu Jul 26 '17 at 01:32
  • @stephenagwu I have improved my method, check now. – Dij Jul 26 '17 at 02:26
  • 1
    Fascinating. A naive algorithm, for sure, but surprisingly effective. I would note that this doesn't degrade well. But it seems sufficient for the task at hand. Good work. – bowheart Jul 26 '17 at 16:49
3

Ah. The algorithm you're looking for is a dfs forest. Wikipedia has some good stuff on trees and forests.

A dfs forest is just a dfs (Depth-First Search) that is run until there are no unvisited nodes. The result is a graph ("forest") of connected and isolated subgraphs ("trees"). These are the "unions" you refer to.

A Depth-First Search is much easier (and faster) when each node is mapped to the nodes to which it's connected. So instead of this data:

[[1,3], [6,8], [3,8], [2,7]]

you want:

{1: [3], 2: [7], 3: [1, 8], 6: [8], 7: [2], 8: [6, 3]}

Transforming your data is fairly trivial (and fast):

function mapNodes(edges) {
    let nodeMap = {}

    edges.forEach(edge => {
        let node1 = edge[0]
        let node2 = edge[1]

        if (!nodeMap[node1]) nodeMap[node1] = [node2]
        else nodeMap[node1].push(node2)

        if (!nodeMap[node2]) nodeMap[node2] = [node1]
        else nodeMap[node2].push(node1)
    })
    return nodeMap
}

Then the dfs itself is a simple recursive algorithm and the dfs forest just keeps running it until there are no more unvisited nodes. Here's a [EDIT: not so] crude example:

function dfsForest(nodeMap) {
    let forest = []
    let nodes = Object.keys(nodeMap)

    while (true) {
        let root = +nodes.find(node => !nodeMap[node].visited)
        if (isNaN(root)) break // all nodes visited

        forest.push(dfs(root, nodeMap))
    }
    return forest
}

function dfs(root, nodeMap, tree = []) {
    if (tree.includes(root)) return tree // base case

    tree.push(root)
    nodeMap[root].visited = true

    let connectedNodes = nodeMap[root]
    for (let i = 0; i < connectedNodes.length; i++) {
        let connectedNode = connectedNodes[i]
        dfs(connectedNode, nodeMap, tree)
    }
    return tree
}

And here's a JSFiddle with all of that.

EDIT:

Well, I said it was crude. I've edited the code and the fiddle, removing the extra visitedNodes array and the n-squared algorithm it created. It should be just about as blazingly fast as is humanly discovered now.

In my tests it takes about 350 milliseconds to re-format the data AND run the dfs forest on 5000 very non-optimal pairs. In an optimal case, it takes about 50 milliseconds. And it degrades very well. For example, doubling the total edges will increase the execution time from between 1.5 and 2.5 times, depending on how optimal the pairs are.

In fact, here's a JSFiddle with the answer by @Dij. You'll see if you double the number of edges, execution time quadruples (yikes). His algorithm does have an interesting feature, in that there are no optimal/non-optimal cases; everything takes the same amount of time. However, even in the most non-optimal case, a dfs forest is still slightly faster than that flat rate.

bowheart
  • 4,616
  • 2
  • 27
  • 27
  • Dang I never thought about doing it this way. The problem Im having is when there are 5000 pairs (maximum amount) my algorithmn times out. I'll let you know what happens with yours. Thanks for the answer! – Stephen Agwu Jul 26 '17 at 11:47
  • @stephenagwu I wasn't kidding when I said this was a crude example. I hastily threw it together without really thinking about performance. I should have more respect for SO, and I apologize about that. I've edited it now to make it a true, awesome dfs forest. Happy coding! – bowheart Jul 26 '17 at 14:41
1

To meet first requirement you can iterate array, within iteration procedure exclude current array from a new array containing all adjacent indexes. Check if adjacent arrays contain one or more elements of current array, if true push the elements to a new array.

Filter original array for elements which do not contain elements of previously filtered array.

Use Set to remove duplicate entries from arrays.

const arr = [[1,3], [6,8], [3,8], [2,7]];

let res = [];

for (const[key, [a, b]] of Object.entries(arr)) {
  const adjacent = arr.filter((el, index) => index !== +key);

const has = adjacent.filter(el => el.includes(a) || el.includes(b));
  res = [...res, ...has.filter(prop => !res.includes(prop))];
}

let not = new Set(...arr.filter(([a, b]) => !res.some(([c, d]) => 
            a === c || b === d || a === d || b === c)));

let set = new Set();

for (const [a, b] of res) {
  if (!set.has(a)) set.add(a);
  if (!set.has(b)) set.add(b);
}

res = [[...set], [...not]];

console.log(res);
guest271314
  • 1
  • 15
  • 104
  • 177
  • wow thanks for this answer. I'm a bit new so I'm still trying to decipher each line. For instance: the line starting with const adjacent = arr.filter... (the first one), what doess the '+key' part represent. – Stephen Agwu Jul 26 '17 at 01:45
  • @stephenagwu `Object.entries()` returns an array of property, values of an object. When an array is passed, in general, the property, value pairs are [index, array] or `[key, [a, b]]` using destructuring assignment, where `key` is the index as string, `[a, b]` represent the elements of the array, for example `1` : `a`, `3`: `b`. Object properties are strings, `+` operator casts index : `key` of array to number. The efficiency could procedure could probably be improved. – guest271314 Jul 26 '17 at 01:50
  • oh so that would be the same as using parseInt(key)? – Stephen Agwu Jul 26 '17 at 01:52
  • Yes, or `Number("1")` – guest271314 Jul 26 '17 at 01:53
  • interesting. From looking through your code it seems I know how to do a lot of those things just not in such a high level way. But the use of Sets was amazing, thanks for the answer! – Stephen Agwu Jul 26 '17 at 01:55
  • `Set` was used to remove duplicates from array and could probably be substituted for `Array` methods. – guest271314 Jul 26 '17 at 01:57