16

I found a puzzle online on interviewStreet and tried to solve it as follows:

There is an infinite integer grid at which N people have their houses on. They decide to unite at a common meeting place, which is someone's house. From any given cell, all 8 adjacent cells are reachable in 1 unit of time. eg: (x,y) can be reached from (x-1,y+1) in a single unit of time. Find a common meeting place which minimizes the sum of the travel times of all the persons.

I thought first about writing a solution in n² complexity in time, but the constraints are

1<=N<=10^5 and The absolute value of each co-ordinate in the input will be atmost 10^9

So, I changed my first approach and instead of looking at the problem with the distances and travel times, I looked at the different houses as different bodies with different weights. And instead of calculating all the distances, I look for the center of gravity of the group of bodies.

Here's the code of my "solve" function, vectorToTreat is an lengthX2 table storing all the data about the points on the grid and resul is the number to print to stdout:

long long solve(long long** vectorToTreat, int length){
    long long resul = 0;
    int i;
    long long x=0;
    long long y=0;
    int tmpCur=-1;
    long long tmp=-1;
    for(i=0;i<length;i++){
        x+=vectorToTreat[i][0];
        y+=vectorToTreat[i][1];
    }
    x=x/length;
    y=y/length;
    tmp = max(absol(vectorToTreat[0][0]-x),absol(vectorToTreat[0][1]-y));
    tmpCur = 0;
    for(i=1;i<length;i++){
        if(max(absol(vectorToTreat[i][0]-x),absol(vectorToTreat[i][1]-y))<tmp){
            tmp = max(absol(vectorToTreat[i][0]-x),absol(vectorToTreat[i][1]-y));
            tmpCur = i;
        }
    }
    for(i=0;i<length;i++){
        if(i!=tmpCur)
            resul += max(absol(vectorToTreat[i][0]-vectorToTreat[tmpCur][0]),absol(vectorToTreat[i][1]-vectorToTreat[tmpCur][1]));
    }

    return resul;
}

The problem now is that I passed 12 official test cases over 13, and I don't see what I'm doing wrong, any ideas? Thanks in advance. AE

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Peter
  • 7,020
  • 4
  • 31
  • 51
  • 6
    The problem is that the center of gravity, and the point which minimizes total distance, are *different* points. If you're trying to find the latter, don't write an algorithm which finds the former. – Eric Lippert Aug 24 '11 at 06:53
  • 1
    I don't know a solution but I would suggest trying to get some insights by solving a simpler version of the problem. There is a linear solution if the problem is restricted to one dimension; work out that solution, and then see if that helps solve the two-dimensional problem. – Eric Lippert Aug 24 '11 at 16:59
  • I guess your solution solves the problem, when the distance to the diagonal cells is calculated with Pythagoras. – user unknown Aug 24 '11 at 19:56
  • Can there be more than one house per cell? Has the meating point to be one of the occupied cells (a house) or a free cell without house, or doesn't it play a role? Have you tried to find the average x- and y-position? Do you have sample data and solution for these? – user unknown Aug 24 '11 at 20:21
  • Hello, thanks to you all for commenting this post. @Eric Lippert, the center of gravity is indead different from the point that minimizes total distance, but in my code, after finding the center of gravity I look for the nearest point and say: this is the solution point. – Peter Aug 25 '11 at 02:22
  • @user unknown, the website stated these samples to solve: 4 0 1 2 5 3 1 4 0 And: 6 12 -14 -3 3 -14 7 -14 -3 2 -12 -1 -6 So I assumed that there is only one one per cell. – Peter Aug 25 '11 at 02:23
  • @user unknown, the first example is with 4 houses, and the second is with 6. The results should be 8 for the first sample and 54 for the second. – Peter Aug 25 '11 at 02:24
  • Is this (4,0)(1,2)(5,3)(1,4)(0,?) and what is the trailing 0, and (6 12) (-14 -3)( 3 -14)( 7 -14)( -3 2)( -12 -1)( -6 ?) - or how do these numbers describe positions in the grid? Position (3,2) for the first sample would be my solution too, with distanceSum(8). – user unknown Aug 25 '11 at 03:57

9 Answers9

11

The key to this problem is the notion of centroid of a set of points. The meeting place is the closest house to the centroid for the set of points representing all the houses. With this approach you can solve the problem in linear time, i.e. O(N). I did it in Python, submitted my solution and passed all tests.

However, it is easy to build a data set for which the centroid approach does not work. Here's an example:

[(0, 0), (0, 1), (0, 2), (0, 3), 
 (1, 0), (1, 1), (1, 2), (1, 3), 
 (2, 0), (2, 1), (2, 2), (2, 3), 
 (3, 0), (3, 1), (3, 2), (3, 3), 
 (101, 101)]

The best solution is meeting at the house at (2, 2) and the cost is 121 (you can find this with exhaustive search - O(N^2)). However, the centroid approach gives a different result:

  • centroid is (7, 7)
  • closest house to centroid is (3, 3)
  • cost of meeting at (3, 3) is 132

Test cases on the web site are obviously shaped in a such a way that the centroid solution is OK, or perhaps they just wanted to figure out if you know about the notion of centroid.

MarcoS
  • 13,386
  • 7
  • 42
  • 63
  • 1
    Just clarifying for anyone looking for the solution - "The meeting place is the closest house to the centroid for the set of points representing all the houses" - "Closest" here means Euclidean distance and not the Chebyshev distance (as otherwise used in the problem). The approach definitely works, by the way :) – Vikesh May 10 '12 at 20:14
  • Does NOT work for: (12 -14) (-3 3) (-14 7) (-14 -3) (2 -12) (-1 -6) – CommonMan Aug 21 '12 at 18:27
  • @Manan I wrote in my own answer that it's easy to find an example for which the centroid approach doesn't work. – MarcoS Aug 22 '12 at 09:40
8

I didn't read your code, but consider the following example:

  • 2 guys live at (0, 0)
  • 1 guy lives at (2, 0)
  • 4 guys live at (3, 0)

The center of gravity is at (2, 0), with minimum total travel time of 8, but the optimum solution is at (3, 0) with minimum total travel time of 7.

Rotsor
  • 13,655
  • 6
  • 43
  • 57
  • There is also an optimum at (0,0) in this example. – John L Aug 24 '11 at 07:07
  • @John, No, there is no. The total travel time will be 6 for (0,0). Or did I miss something in the problem statement? – Rotsor Aug 24 '11 at 07:19
  • Oops, I *did* miss something, but that does not make (0,0) an optimum solution. – Rotsor Aug 24 '11 at 07:28
  • Hmm, this feels slightly wrong, since it assumes we're talking about euclidian distance, but we're talking about an eightconnected grid (so a d_\infty metric instead of a d_2). For instance, (0,0) is as far from (5,0) as it is from (5,5), so it should not be to hard to create a counterexample. I think the idea is sound though, but you should probably devise a 'center of gravity' for a d_\infty metric instead of a d_2 metric. – markijbema Aug 24 '11 at 07:31
  • Sorry, I misread your example, you're right (0,0) definitely isn't an optimum here. – John L Aug 24 '11 at 07:32
  • @markijbema, you can see that the OP's solution uses euclidean center of gravity calculation, so the counter-example should be valid. – Rotsor Aug 24 '11 at 08:19
  • However, the question was how to solve the stated problem, not how to improve the speed of the OP's (incorrect) solution... – markijbema Aug 24 '11 at 14:21
  • @markijbema, sorry, what? What speed are you talking about? My answer contains a counter-example to the idea of taking the center of gravity as the problem solution. Nothing more, nothing less. Also, the question does not ask how to solve the problem, but rather asks to find the bug, which, I believe, I did! – Rotsor Aug 24 '11 at 14:37
  • I removed all the unnecessary text from the answer making it simpler to understand. – Rotsor Aug 24 '11 at 14:59
  • Hello, thanks to you all for answering and commenting this post. @Rostor, in my code after finding the center of gravity I look for the nearest point and say: this is the solution point. I assumed that only one person can live in one cell, The samples that the puzzle stated made me make this assumption. But I think, it is wrong to just assume this, I will look at the problem differently. Thanks. – Peter Aug 25 '11 at 02:32
  • @Amine, Even if you assume that, it's trivial to adjust the example: `(0,0), (1,0), (2000,0), (3000,0), (3001,0), (3002, 0), (3003, 0)` – Rotsor Aug 25 '11 at 04:32
4

Hello and thanks to you for your answers and comments, they were very helpful. I finally gave up on my algorithm using the center of gravity, when I ran some samples on it, I noticed that when the houses are gathered in different villages with different distances between them, the algorithm does not work. If we consider the example that @Rostor stated above:

(0,0), (1,0), (2000,0), (3000,0), (3001,0), (3002, 0), (3003, 0)

The algorithm using the center of gravity answers that the 3rd house is the solution, but the right answer is the 4th house. The right notion to use in this kind of problems is the median, and adapt it to the dimensions wanted. Here is a great article talking about The Geometric median, Hope it helps.

Peter
  • 7,020
  • 4
  • 31
  • 51
2

SOLUTION:

  1. if all points are in line and people can move only in 2 driections (left and right)

    sort points and calculate two arrays one if they move only left and other if they move only right. add both vectors and find minimum to find solution

  2. if people can move only 4 directions (left, down, up, right) you can apply same rule, all you need to support is when you sort in one axis you must be able to usort back, so when sorting you must also save sorting permutations

  3. if people can move in 8 directions (as in question) you can use same algorhitm as when used in 4 directions (2. algorhitm ), since if you correctly observe movements you can see that it is possible to make same number of moves if everybody moves only diagonaly and there is no need for them to move left, right up and down , but only left-up, up-right, left-down and down-right if for each point (x,y) holds that (x+y) % 2 == 0 - imagine that grid is chessboard and houses are on black squares only

    Before applying 2. algorhitm you have to make points tranformation so

    (x,y) becomes (x+y,x-y) - this is rotations of points by 45 degrees. Then you apply 2. algorhitm and divide result by 2.

Luka Rahne
  • 10,336
  • 3
  • 34
  • 56
0

"...which is someone's house" means you pick an occupied house, not an arbitrary location.

Edit: oops, max(abs(a-A),abs(b-B)) replaces (abs(a-A)+abs(b-B)). See L_p space for more details when p->infinty.

The distance from (a,b) to (A,B) is max(abs(a-A),abs(b-B)). A brute force way is to compute the total travel time to meet at each occupied house, keeping track of the best meeting place so far.

This may take a while. The center-of-mass sorting may allow you to prioritize the search order. I see you are using the good center of mass calculation for this metric: to take a simple average of the first coordinate and a simple average of the second component.

Chris Kuklewicz
  • 8,123
  • 22
  • 33
  • 1
    Hmmm. I don't know about your equation. Maybe it should be distance from (a,b) to (A,B) = max(abs(a-A), abs(b-B)), since a diagonal is the same as an adjacent. Distance from (0,0) to (2,3) is 3, for example, not 5. – Phil Freihofner Aug 24 '11 at 08:15
  • I believe your distance formula is not correct since traveling in a diagonal only costs 1 unit. Just get the example from the description: `(x,y) can be reached from (x-1,y+1) in a single unit of time` but your formula would return 2 instead of 1. It should be `max(abs(a-A), abs(b-B))`. – user85421 Aug 24 '11 at 08:18
0

If you think a bit about the distance function you get as travel time between (x1,y1) and (x2,y2)

def dist( (x1,y1), (x2,y2)):
    dx = abs(x2-x1)
    dy = abs(y2-y1)
    return max(dx, dy)

You can see that if you make a sketch on a paper with a grid.

So you only have to iterate over each house, sum up the travel times of the others and take the house with the minum sum.

The full solution is

houses = [ (7,4), (1,1), (3,2), (-3, 2), (2,7), (8, 3), (10, 9) ]

def dist( (x1, y1), (x2, y2)):
    dx = abs(x1-x2)
    dy = abs(y1-y2)
    return max(dx, dy)

def summed_time_to(p0, houses):
    return sum(dist(p0, p1) for p1 in houses)

distances = [ (summed_time_to(p, houses), i) for i, p in enumerate(houses) ]
distances.sort()

min_dist = distances[0][0]

print "best houses are:"
for d, i in distances:
    if d==min_dist:
        print i, "at", houses[i]
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
  • 2
    Which is the n-squared solution that the original poster states he is attempting to improve on. Can you improve this to better than n-squared? – Eric Lippert Aug 24 '11 at 15:06
  • @rocksportrocker, thanks for you effort but as Eric said, I did tested this solution before and it didn't work out well for large input. – Peter Aug 25 '11 at 02:40
0

I wrote a quick-and-dirty grid-distance tester in scala, which compares the average with the minimum of an exhaustive search:

class Coord (val x:Int, val y: Int) {
  def delta (other: Coord) = {
    val dx = math.abs (x - other.x)
    val dy = math.abs (y - other.y)
    List (dx, dy).max
  }
  override def toString = " (" + x + ":" + y + ") "
}

def run (M: Int) {
  val r = util.Random 
  // reproducable set:
  // r.setSeed (17)

  val ucells = (1 to 2 * M).map (dummy => new Coord (r.nextInt (M), r.nextInt (M))).toSet take (M) toSeq
  val cells = ucells.sortWith ((a,b) => (a.x < b.x || a.x == b.x && a.y <= b.y))

  def distanceSum (lc: Seq[Coord], cell: Coord) = lc.map (c=> cell.delta (c)).sum

  val exhaustiveSearch = for (x <- 0 to M-1;
    y <- 0 to M-1)
      yield (distanceSum (cells, new Coord (x, y)))

  def sum (lc: Seq[Coord]) = ((0,0) /: lc) ((a, b) => (a._1 + b.x, a._2 + b.y))
  def avg (lc: Seq[Coord]) = {
    val s = sum (lc) 
    val l = lc.size 
    new Coord ((s._1 + l/2) / l, (s._2 + l/2) / l)
  }
  val av = avg (ucells)
  val avgMethod = distanceSum (cells, av)

  def show (cells : Seq[Coord]) {
     val sc = cells.sortWith ((a,b) => (a.x < b.x || a.x == b.x && a.y <= b.y))
     var idx = 0
     print ("\t")
     (0 to M).foreach (i => print (" " + (i % 10))) 
     println ()
     for (x <- 0 to M-1) {
       print (x + "\t")
       for (y <- 0 to M -1) {
         if (idx < M && sc (idx).x == x && sc (idx).y == y) {
           print (" x") 
           idx += 1 }
           else if (x == av.x && y == av.y) print (" A")
           else print (" -")
       }
       println ()
     }
  }

  show (cells)
  println ("exhaustive Search: " + exhaustiveSearch.min)
  println ("avgMethod: " + avgMethod)
  exhaustiveSearch.sliding (M, M).toList.map (println)
}

Here is some sample output:

run (10)
     0 1 2 3 4 5 6 7 8 9 0
0    - x - - - - - - - -
1    - - - - - - - - - -
2    - - - - - - - - - -
3    x - - - - - - - - -
4    - x - - - - - - - -
5    - - - - - - x - - -
6    - - - - A - - x - -
7    - x x - - - - - - -
8    - - - - - - - - - x
9    x - - - - - - - - x
exhaustive Search: 36
avgMethod: 37
Vector(62, 58, 59, 60, 62, 64, 67, 70, 73, 77)
Vector(57, 53, 50, 52, 54, 57, 60, 63, 67, 73)
Vector(53, 49, 46, 44, 47, 50, 53, 57, 63, 69)
Vector(49, 46, 43, 41, 40, 43, 47, 53, 59, 66)
Vector(48, 43, 41, 39, 37, 37, 43, 49, 56, 63)
Vector(47, 43, 39, 37, 36, 37, 39, 46, 53, 61)
Vector(48, 43, 39, 36, 37, 38, 40, 43, 51, 59)
Vector(50, 44, 40, 39, 38, 40, 42, 45, 49, 57)
Vector(52, 47, 44, 42, 42, 42, 45, 48, 51, 55)
Vector(55, 52, 49, 47, 46, 47, 48, 51, 54, 58)

The average isn't always the perfect position (as shown in this example), but you can follow the neighbours with even or better value, to find the best position. It is a good starting point, and I nether found a sample of a local optimum, where you get stuck. This could be essential for huge datasets.

But I don't have a prove whether this is always the case, and how to find the perfect position directly.

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • Your exhaustive search checked all places, you only need to consider the house positions. – rocksportrocker Aug 25 '11 at 08:46
  • If you restrict the problem to a one-dimensional problem (eg all y fixed) the optimal solution is the median (and not the average) of all x-position. So you should replace your averaging by calculating the median, maybe this improves your method. – rocksportrocker Aug 25 '11 at 08:46
  • @rocksportrocker: Yes, I somehow missed, that it is one of the houses. But for small grids, like 8x8, or even 40x40, calculating all positions isn't a problem for my high-performance year-2005-laptop. It allows to see level curves, which might help to understand the problem. – user unknown Aug 25 '11 at 12:32
  • How do I build the median for a 2-dim grid? I can easily calculate the average, and search for a house with distance 0, 1, 2, ... . – user unknown Aug 25 '11 at 12:59
  • as I said the median is the solution for the 1dim problem. So you have to build the median sepearatly for the x and y coordinates. – rocksportrocker Aug 25 '11 at 13:11
  • But if I find a med-house at x=42 and a med-house at y=34, there needn't be a house at (42, 34). What does it help? – user unknown Aug 25 '11 at 13:25
-1

i even tried but only got pass through 4 of 13 test cases. segmentation fault wat they say.

but i have made two arrays of 100001 each and some variables m using.

My algo.

find centroid of the given points.

find the point closest to centroid. get the sum of all the distances using maximum(abs(a-A),abs(b-B)).

-1

I tried to solve that using the method of geometric median. But only 11 of 13 test cases passed. This was my strategy.

1. finding centroid of a set of points.
2. then found the point closest to that centroid.
pkvprakash
  • 329
  • 5
  • 12