6

Given a list of elements, say [1,2,3,4], and their pair-wise affiliation, say

[[0,  0.5, 1,  0.1]
 [0.5, 0,  1,  0.9]
 [ 1,  1,  0,  0.2]
 [0.1, 0.9, 0.2, 0]]

For those familiar with graph-theory, this is basically an adjacency matrix.

What is the fastest way to sort the list such that the distance in the list best correlates with the pair-wise affiliation, i.e. pairs of nodes with high affiliation should be close to each other.

Is there a way to do this (even a greedy algorithm would be fine) without going too much into MDS and ordination theory?

As a bonus question:

Note that some pair-wise affiliations can be represented perfectly, like for the list [1,2,3] and a pair-wise affiliation:

[[0, 0, 1]
 [0, 0, 1]
 [1, 1, 0]]

the perfect order would be [1,3,2]. But some affiliations can't, like this one:

[[0, 1, 1]
 [1, 0, 1]
 [1, 1, 0]]

where any order is equally good/bad.

Is there a way to tell the quality of an ordering? In the sense of how well it represents the pair-wise affiliations?

j-i-l
  • 10,281
  • 3
  • 53
  • 70
  • 1
    Might do better on [cs.se]. Flag and ask for moderator migration if you agree. – AakashM Aug 05 '15 at 12:47
  • @AakashM hm, maybe you are right. Honestly I'm not so sure where this fits best. Could also be a candidate for http://math.stackexchange.com/. I'll wait and see for now, if others are for migrating it, that's totally fine for me. – j-i-l Aug 05 '15 at 12:50
  • So what exactly is your metric / optimization objective? (e.g. L1/L2 errors)? – sascha Aug 05 '15 at 13:05
  • @sascha you can also just look at an unweighted graph (boolean weight). The metric is not a priori defined but depends on the space you choose to project into. One way would be to just put the graph in an n dim Euclidean space (where n is at most the length of your list). If you decrease n, eventually it will not be possible to place your nodes such that the configuration respects perfectly the affiliation matrix. The error (L1/L2 up to you) between affiliation matrix and the effective distance in the reduced space can be used as a quality measure. But as I said, without MDS, if possible. – j-i-l Aug 05 '15 at 13:17
  • Fastest in execution time or in implementation time? – David Eisenstat Aug 05 '15 at 13:32
  • I'm still confused. If you want an ordering, you should have an objective a-priori (in terms of projections: this should be an 1d-projection; like a discrete number line). If you have this objective, it doesn't matter if there is a perfect order or not (if you treat it as an optimization problem). I don't know anything about MDS, and therefore maybe think different about your problem. Let's take a super-naive objective-function: sum of losses (L1) where there are quadratic losses; one for each pair -> product of (1-affliation) and distance in ordering. Is the solution of this what you want? – sascha Aug 05 '15 at 13:33
  • @DavidEisenstat fast in terms of iterations rather than time. It should not be a brute force method as this would likely not work with long lists. – j-i-l Aug 05 '15 at 13:36
  • @sascha you are correct and your example would work. A solution to my question would be the heuristics for an algorithm that effectively minimises the quadratic losses. I just wanted to point out that the metric is not a priori defined as you could embed a graph into a vector space with any metric (even non-proper metrics like the cosine distance can work). – j-i-l Aug 05 '15 at 14:00
  • 1
    I threw together a O(c * n^2 + n * log(n)) attractive-force 1d equilibrium solver. There's a possibility that it could give you a palatable approximation. – Louis Ricci Aug 05 '15 at 16:30

1 Answers1

1

Here's a lightly tested algorithm that takes the adjacency matrix, sets up the elements/nodes in order of appearance, then tries to find an equilibrium. Since it's 1d I just picked a really simple attractive-force formula. Maybe adding repulsive force would improve it.

/*
 * Sort the nodes of an adjacency matrix
 * @return {Array<number>} sorted list of node indices
 */
function sort1d(mat) {
    var n = mat.length;
    // equilibrium total force threshold
    var threshold = 1 / (n * n);
    var map = new Map(); // <index, position>
    // initial positions
    for(var i = 0; i < n; i++) {
        map.set(i, i);
    }
    // find an equilibrium (local minima)
    var prevTotalForce;
    var totalForce = n * n;
    do {
        prevTotalForce = totalForce;
        totalForce = 0;      
        for(var i = 0; i < n; i++) {
            var posi = map.get(i);
            var force = 0;
            for(var j = i + 1; j < n; j++) {
                var posj = map.get(j);
                var weight = mat[i][j];
                var delta = posj - posi;
                force += weight * (delta / n);
            }
            // force = Sum[i, j=i+1..n]( W_ij * ( D_ij / n )
            map.set(i, posi + force);
            totalForce += force;
        }
        console.log(totalForce, prevTotalForce);
    } while(totalForce < prevTotalForce && totalForce >= threshold);
    var list = [];
    // Map to List<[position, index]>
    map.forEach(function(v, k) { list.push([v, k]); });
    // sort list by position
    list.sort(function(a, b) { return a[0] - b[0]; });
    // return sorted indices
    return list.map(function(vk) { return vk[1]; });
}

var mat = [
    [0,  0.5, 1,  0.1],
    [0.5, 0,  1,  0.9],
    [1,  1,  0,  0.2],
    [0.1, 0.9, 0.2, 0]
];
var mat2 = [
    [0, 1, 1],
    [1, 0, 1],
    [1, 1, 0]
];
console.log(sort1d(mat)); // [2, 0, 1, 3]
console.log(sort1d(mat2)); // [0, 1, 2]
Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • Cool, I'll have a closer look at it as soon as I find the time to. In principle, if you have the constraint that only 1 node can occupy a slot in the list, then this constraint is de facto already a repulsive force (a very weird one though that only acts locally). I wonder if this could be done in a faster way than O(n^2)... – j-i-l Aug 05 '15 at 16:38
  • @jojo - I initialize the node positions to their integer index rather than using random positioning like in 2d force directed display which may improve the results. When force is being applied the new positions are real numbers, so they are just points on a number line, they only become slots in an array at the final sorting step. Faster than O(n^2) would probably mean a less accurate appromixation, maybe by creating n lists each sorted by the corresponding row in the adjacency matrix, then someone merge the lists together. – Louis Ricci Aug 05 '15 at 16:57
  • 1
    Here's an idea for initializing: Find the minimum spanning tree of the graph (since you have affinities, not distances, you have to tweak the MST cost function a little bit). Then you can iterate over the MST for the initial placements. To iterate over a tree node (which may have any number of children), iterate over half its children, then yield the node itself, then iterate over the rest of the children. – Jerry Federspiel Aug 05 '15 at 17:35
  • @JerryFederspiel - For all I know this 1d version doesn't even need optimal starting positions to converge on an answer. I don't really know how this would be solved with 100% accuracy, possibly solving a set of linear equations (or inequalities) where the variables are the indices of the final list and the inequalities are the various weights between vertices. – Louis Ricci Aug 05 '15 at 18:04
  • Yeah, it shouldn't need optimal positions to start. I thought it might reduce the iteration count, but I haven't tested that or anything. It feels like the real solution should be close to some in-order iteration over the MST... but that's probably because I have some implicit assumptions of "niceness" in the affinities that won't always hold. Something that always finds the optimum even when data isn't nice may very well be asymptotically equivalent to brute force. – Jerry Federspiel Aug 05 '15 at 18:13
  • @JerryFederspiel You are probably right about the convergence to a time scale close to brute force for particular configurations. It is, at least from how I see it, not a problem with a nice solution. However, I'm pretty confident that there are some greedy approaches which get to, say, an acceptable solution. But admittedly I don't know how such a greedy approach should look like. – j-i-l Aug 05 '15 at 20:17