Efficiently obtain the graph from given facts

Question

I have a set of points in 2-D plane(xy) say n points.

(x1, y1), (x2, y2), (x3, y3), ......................., (xn, yn)

My objective is to draw a graph. Two nodes(points) in the graph will be connected iff abs(difference in x coordinate) + abs(difference in y coordinate) = L(given).

It can be done O(n*n). Is it possible to do it efficiently.

BTW I am trying to solve this problem

Should you connect all point or can you drop some points that do not satisfy the condition ? and wich kind of graph do you want to draw ? — Rachid Ait Abdesselam, Dec 15 '16 at 18:58
if two point do not satisfy the condition then there will be no edge between them — cryptomanic, Dec 15 '16 at 19:16
Do you have any additional information about points? Without it it's not possible to work better than O(n^2) because the graph may contain O(n^2) edges. — maxim1000, Dec 15 '16 at 19:35

ruakh · Accepted Answer · 2016-12-16T04:59:31.310

You can do this in O(n log n + E) time, where E is the actual number of edges you end up with (that is, the number of pairs of neighbors).

For any given point, its allowed-neighbor locations form a diamond shape with side-length L√2:

        *
      *   *
    *       *
  *           *
*       o       *
  *           *
    *       *
      *   *
        *

If you sort the points by x + y with fallback to x − y, then a single O(n + E) pass through the sorted points will let you find all neighbors of this type:

for each point. (To do this, you use an index i to keep track of the current point you're finding neighbors for, and a separate index j to keep track of the line of allowed neighbors such that x_j − y_j = x_i − y_i + L. That may sound like O(n²), since you have two indices into the array; but the trick is that j is monotonically increasing with i, so each of i and j make just a single pass through the array. This would even be an O(n) pass, except that if you do find any neighbors of (x_i, y_i), then you'll need to re-consider them as potential neighbors for (x_i+1, y_i+1), so you can't increment j. So it comes out to an O(n + E) pass.)

You can then re-sort them by y − x with fallback to x + y, and repeat the process to find these neighbors:

And since neighbor-ness is a symmetric relation, you don't actually need to worry about the remaining neighbors:

        o
  *           *
    *       *
      *   *
        *

(The overall O(n log n + E) time includes O(n log n) time to sort the points, plus the time for the two O(n + E) passes.)

nice explanation. I am curious about maximum value of E possible for a given n. For n=1(E=0), n=2(E=1), n=3(E=3)..... As the value of n vary can we say something about the worst value of E. — cryptomanic, Dec 16 '16 at 09:20
one more question? It seem that same edge can come more than once — cryptomanic, Dec 16 '16 at 09:30
@cryptomanic: For a graph in general, E is at most n(n-1)/2. (Google "handshake problem" for details.) For your specific problem, though, it's definitely much lower. Can two points be at the same location? If so, then I think the upper bound is about n²/4. If not, then I'm not sure, but I think it's O(n). — ruakh, Dec 16 '16 at 16:31
@cryptonomic: If you implemented this as I described, then the only time an edge should appear more than once is when one point is at (x,y) and the other is at (x,y+L) (since then they're neighbors of both types). If that's a problem, then you can address it by changing one of the passes to skip such cases. — ruakh, Dec 16 '16 at 16:36

score 1 · Answer 2 · answered Dec 15 '16 at 18:59

It is certainly possible to do it efficiently given certain assumptions about the data. I'll think about more general cases. If, for instance, the points are distributed homogeneously and the interaction distance L(given) is small relative to the spread of the data then the problem can be converted to O(n) by binning the particles.

This takes you from the situation on the left to the situation on the right:

The bin size is taken to be >=L(given) and, for any particle, the particle's bin and the 8 neighbouring bins are all searched. If the number of particles in a bin averages a constant d, then the problem is solvable in O(9dn)=O(n) time.

score 0 · Answer 3 · answered Dec 15 '16 at 19:41

Another possibility, related to the foregoing, is to use a sparse-matrix structure to store 1 values at the location of all of your points and 0 values elsewhere.

While nice libraries exist for this, you can fake it by coming up with a hash which combines your x and y coordinates. In C++ that looks something like:

std::unordered_set< std::pair<int,int> > hashset;

Presize the hashtable so it is perhaps 30-50% larger than you need to avoid expensive rehashing.

Add all the points to the hashset; this takes O(n) time.

Now, the interaction distance L(given) defines a diamond about a center point. You could pregenerate a list of offsets for this diamond. For instance, if L=2, the offsets are:

int dx[]={0,-2,-1,0,1,2, 1,0,-1};
int dy[]={0, 0, 1,2,1,0,-1,2,-1};

Now, for each point, loop over the list of offsets and add them to that point's coordinates. This generates an implicit list of locations where neighbours could be. Use the hashset to check if that neighbour exists. This takes O(n) time and is efficient if 8L << N (with some qualifications about the number of neighbours reachable from the first node).

score 0 · Answer 4 · answered Dec 16 '16 at 02:01

I like ruakh@'s solution a lot. Another approach would allow incrementally growing the point set without a loss in efficiency.

To add each point P, you'd search the tree for points Q meeting your criteria and add edges when any were found.

At each level of any k-d tree search, there is available the rectangular extent represented by each child. In this case, you would only continue the search "down" into a child node if and only if its extent could possibly contain a point matching P. I.e., the rectangle would have to include some part of the diamond that ruakh@ describes.

Analysis of k-d tree searches is usually tricky. I'm pretty sure this algorithm runs in expected O(|E| log n) time for a random point set, but it's also pretty easy to imagine point sets where performance is better and others where it's worse.

score -1 · Answer 5 · answered Dec 15 '16 at 21:43

consider the lines y = x and y = -x consider the distance of each point from these lines. two points are connected only if they have the right difference of distance to these two lines. Thus you can bucket all the points by distance to these lines. Then in each bucket, have an ordered map of points (ordered by how far along the line they were). Any points within the right distance in this ordered map should be connected in the graph. should be N*Log(N) worse case, even if all the points are ontop of each other.

Efficiently obtain the graph from given facts

5 Answers5