11

Assume the input is specified as an array of building objects, where each building has a number of residents an its distance from the start of the street.

Total distance = SUM(distance[i] * #residents[i])

I found here two questions that are similar but they have slightly different requirements:

  • Minimizing weighted sum: The solution of this question finds the minimum path crossing all points. Here I am looking for minimal sum of total distances from each building to the place where the mailbox is.

  • Minimum Total Distance From Locations: It uses 2D coordinates, and more important, the solution doesn't consider the weight (number of residents) on each location.

I saw this problem while reading Elements of Programming Interviews (really nice book, BTW), and this is listed as a variant of the quickselect algorithm. Considering that the median is the point that minimizes the sum of the distances, it looks like the solution would involve the quick selection to find the building at the "median" position of the street in O(N).

But I can't figure out how to account the residents on each building and still keep the solution linear.

carlos22
  • 163
  • 1
  • 7
  • So is the mailbox currently present at the start of the street or you want to find a point where if we place the mailbox would reduce the distance to other points ? – zenwraight Jun 23 '17 at 16:39
  • Can you send me link to this problem or tell me the page number of the problem in the book – zenwraight Jun 23 '17 at 16:39
  • @zenwraight sure, its the second option, find a place to put the mailbox in. In the book, the problem is the last variant of the question 12.9 - find the largest kth (page 201 on my copy). I think the google books preview search shows that page: https://www.google.com/search?tbm=bks&q=Elements+of+Programming+Interviews+variant+mailbox+total+distance – carlos22 Jun 23 '17 at 17:35
  • Hint: You want to find a mailbox position that has ___ people on each side. – j_random_hacker Jun 25 '17 at 14:41

2 Answers2

10

We can use the deltas to determine direction. I'll explain what I mean. As it relates to choosing a mailbox location at one of the buildings' (that is, not in between two buildings):

Choose one of the buildings as a pivot (potential mailbox location). Partition the buildings according to their location in relation to the pivot. While partitioning, keep a record of the closest building on each side of the pivot, as well as (1) the total number of residents on each side of the pivot, and (2) f(side, pivot) representing the total sum of each buildings' distance from the pivot multiplied by the number of residents in that building.

Now we have:

L pivot R

To determine if an improvement can be made for our choice, try each of the closest buildings we recorded earlier:

If we were to move our choice one building to the left, how would the results change? Let's call the closest building on the left build_l, and the right, build_r. So the new results moving our choice one building to the left would be:

Left side:

  f(L, pivot)
- distance(build_l, pivot) * num_residents(build_l)

Right side:

  f(R, pivot) 
  // we saved this earlier
+ total_residents(R) * distance(pivot, build_l)
+ num_residents(pivot) * distance(pivot, build_l)

Perform a similar calculation for moving the choice one building to the right to see which yields a smaller total. Then pick the side with the building that yields an improvement and partition it recursively in similar quickselect fashion until an optimal result is found. For the other side we keep track of the total number of residents, and total result for f so far, which we can update with the new additions as we go.

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • Thanks @גלעד-ברקן, I also got a response from the book author himself and he suggested basically the same thing. I am flagging this as the correct response. – carlos22 Jun 27 '17 at 19:20
  • @carlos22 can you post author's answer here since that answer is not complete - doesn't analyze complexity and doesn't explain why quick select is more optimal than simple pass through the post-processed array – Pavel Podlipensky Aug 17 '20 at 19:17
  • @PavelPodlipensky we do not actually use quickselect. I used that word in the reference, "similar quickselect fashion," to mean that I saw a similarity in the procedure. – גלעד ברקן Aug 17 '20 at 19:29
  • @גלעדברקן so what would be the time complexity for your approach then? Why do we need pivot? – Pavel Podlipensky Aug 17 '20 at 19:36
  • @PavelPodlipensky we need a pivot because the question the OP asked seems to be about how to solve this problem in a way consistent with the book they were reading that "listed [it] as a variant of the quickselect algorithm." – גלעד ברקן Aug 17 '20 at 22:46
  • @גלעדברקן I understand your motives, but still not sure why this the best/fastest approach. Seems slower to me - see my other comment for my O estimate for your algo. – Pavel Podlipensky Aug 18 '20 at 00:20
  • @PavelPodlipensky people don't always ask questions on SO to get the most efficient approach. Sometimes they just want to understand something. – גלעד ברקן Aug 18 '20 at 00:34
  • @PavelPodlipensky I don't understand how you can estimate a worst case complexity of an approach you describe as "not sure I completely understand." – גלעד ברקן Aug 18 '20 at 00:35
  • @PavelPodlipensky it could be that you right -- could you present maybe a simple example where it could result in `O(n^2)` complexity? – גלעד ברקן Aug 18 '20 at 00:49
  • @גלעדברקן for that I need to better understand your approach, do you mind writing c++ code for it? – Pavel Podlipensky Aug 20 '20 at 19:16
  • As the number of residents at each point can be variable and we partition only based on the distance (and not the residents), how will this approach work in the case where the rightmost building (for example) has 100 residents and all subsequent buildings have 1. Suppose the random pivot chosen is the mid index in the unsorted array. Number of residents = [1, 1,1,1,100]; Distance = [0,1,2,3,4]. The solution should return the rightmost building, but your solution will only keep considering the left partitions. – Jahnavi Paliwal Jul 24 '21 at 08:54
  • @JahnaviPaliwal thank you for your comment. Can you please show how the algorithm I described would do what you state, by following the relevant calculations described for the "best improvement?" – גלעד ברקן Jul 24 '21 at 11:21
2

I'd solve it in the following fashion (pseudocode below).

Pass array left to right and compute cost of putting mailbox in the house i for all residents j <= i.

# When we place mailbox at building i, all its residents contribute 0 to the total cost.
current_number_of_residents += residents_at_building[i-1]

# For each resident we've seen so far, the cost is increased by building_location[i] - building_location[i-1]
distance_delta = building_location[i] - building_location[i-1]

C_left[i] = C_left[i-1] + distance_delta * current_number_of_residents

Then we process array right-to-left in similar fashion. Now we can find the optimal location by checking for minimum sum:

min_total_distance = min(min_total_distance, C_left[i] + C_right[i])

Time complexity is O(n) since we make 3 passes over the array. Space complexity is O(n) to keep C_left and C_right arrays.

Quickselect algorithm complexity (suggested by @גלעד-ברקן) is also O(n) on average, but can be O(n^2) in the worst case. So I don't see the benefit of that approach over what I suggested. Any comments are welcome.

Pavel Podlipensky
  • 8,201
  • 5
  • 42
  • 53
  • When the OP stated in their question description that the problem "is listed as a variant of the quickselect algorithm" in a book they were reading, I took that to mean that their question is about getting help in how we could apply a similar procedure here. – גלעד ברקן Aug 17 '20 at 22:19
  • Please at least offer an explanation, if not proof, of why you think the procedure I described has worst case`O(n^2)` complexity. I did not suggest using quickselect. I described a procedure that has similarities with it. – גלעד ברקן Aug 17 '20 at 22:25
  • @גלעדברקן I'm not sure I completely understand your approach, thats why I asked you for big O notation estimate. But the reason why I think it is O(n^2) is because every time you pick a pivot you need to update weighted sum of ppl on the left and ppl on the right which will take O(n) time. Please correct me if I'm wrong. – Pavel Podlipensky Aug 17 '20 at 22:42