Minimize Sum of Absolute Difference of Two Arrays

Question

I have two arrays of integers A and B both of size n. The cost of a pair is |A(i) - B(i)|.
I want to pair the n elements of A and B such that the sum of all costs across all A(i)s and B(i)s are minimized.

I understand that I can get O(n log n) by sorting A, then sorting B, and then pairing them together from 1...n respectively, but after attempting for hours and hours, I can't figure out how to prove it. Can somebody help me out?

I've seen how to implement it, I just don't get how to prove it

I try following the concept of saying: Let's assume they're sorted and the sum is some `X`. Let's show that swapping any two items in array `A` will result in sum `Y` where `Y>X`. — shapiro yaacov, Jan 27 '21 at 12:59
@shapiroyaacov that's called proof by contradiction, I think — Abhinav Mathur, Jan 27 '21 at 13:02
@shapiroyaacov I tried going that route, but I couldn't figure out how to do it. I feel like I'm missing an assumption or something obvious because I don't see any way to continue the proof. Also thank you for editing my question to be more readable! — Jeric, Jan 27 '21 at 13:04
@AbhinavMathur I tried going with "For the sake of contradiction, assume this proof is not optimal" and tried to find a contradiction, but I got stuck — Jeric, Jan 27 '21 at 13:13

Deepak Tatyaji Ahire · Answer 1 · 2021-01-27T13:32:21.723

I am following a slightly different approach here to prove this fact by making use of squares rather than absolute.

Consider 2 arrays, A = [a1, a2, ..., an] and B = [b1, b2, ..., bn].

Now, even if I use random pairing (form a pair using any index from A and B ),

Let's say, the sum of squares of difference (S) = a1^2 + b1^2 + a2^2 + b2^2 + ... + an^2 + bn^2 - 2 * (a1 * b3 + a2 * b4 + .... + an * b56 + bn * a34).

The above sum can be represented as S = sum(ai^2) + sum(bi^2) - 2 * sum(ai*bi), for i goes from 1 to n.

To minimise this sum, we need to maximise the part sum(ai*bi), for i goes from 1 to n.

The term sum(ai*bi) will be maximum when the 2 arrays will be sorted.

Thanks for pointing out @Abhinav Mathur: The statement The term sum(ai*bi) will be maximum when the 2 arrays will be sorted can be proved using rearrangement inequality.

"The term sum(ai*bi) will be maximum when the 2 arrays will be sorted" wouldn't you need to prove this as well? — Abhinav Mathur, Jan 27 '21 at 13:21

score 1 · Answer 2 · edited Mar 20 '23 at 03:31

Assume that according to the current sorted arrays, there is a pair |x-a|, and another pair |y-b|. Let's say that switching the elements would give a lesser sum i.e. a more optimal solution.
(Note: while switching around two pairs, the rest of array remains unaffected).

Current total sum of pairs = |x-a| + |y-b|
Modified sum after switching pairs = |x-b| + |y-a|
Difference in sums = diff = |x-b| - |x-a| + |y-a| - |y-b|

If diff is negative, it means we have found a better ordering. If not, it means our original solution was better.

Now, you can take cases and analyse this. (Since the arrays are sorted, let x<y (they're from the first array) and a<b (they're from second array).

Case 1: x>b or y<a:
In this case, both sums will be equal, which can be easily seen by expanding the modulus
Case 2: a<x<b:
If y>b, diff = 2*(b-x). Since we assumed b>x, diff is positive.
If y<b, diff = 2*(y-x). Since y>x as stated earlier, diff is again positive.

You can continue taking similar cases and prove that diff will always be positive, meaning that our original ordering will be the most efficient one.

score 1 · Answer 3 · answered Jan 27 '21 at 14:17

Sorting and pairing creates a matching that we might call "monotonic", which ensures that if A[i] matches B[x] and A[j] matches B[y], then:

If A[i] < A[j] then B[x] <= B[y]; and
If B[x] < B[y] then A[i] <= A[j]

If you choose a matching that is not monotonic, then one of these rules will be violated for some pair of matchings.

If we pick any two elements from both arrays such that A[i] <= A[j] and B[x] <= B[y], then we can evaluate the cost of the monotonic pairing and the other pairing. Note that if A[j] = A[j] or B[i] = B[j] then both pairings have the same cost so it doesn't matter which one we call monotonic.

In order to compare the costs, we need to get rid of the absolute value operations. We can do that by separately considering all the possible orderings between the 4 values:

Case: A[i] <= A[j] <= B[x] <= B[y]:

Monotonic cost: B[x]-A[i] + B[y]-A[j]
Swapped cost: B[y]-A[i] + B[x]-A[j]
Difference: 0
cost is the same - doesn't matter which we choose

Case: A[i] <= B[x] <= A[j] <= B[y]

Monotonic cost: B[x]-A[i] + B[i]-A[j]
other cost: B[y]-A[i] + A[j]-B[x]
Difference: 2A[j] - 2B[x]
since A[j] >= B[x], monotonic is as good or better

... etc

If you go through all 6 possible orderings, in every case you find that the monotonic matching is as good or better. Given any matching, you can make every pair of element matchings monotonic, and the cost can only go down.

If you start with an optimal matching and make every pair of matchings monotonic then you end up with an optimal monotonic matching. (In fact the one you start with has to be monotonic if it's optimal, but we don't have to prove that) Since every monotonic matching has the same cost, and at least one of them is optimal, they must all be optimal.

Hey! Thanks for writing, but was the sorting idea very obvious to you when you first seen such problem? — Vishwas Patel, Jan 01 '22 at 05:43
Sorting and matching is an obvious thing to try. Proving to yourself that it works is the hard part. — Matt Timmermans, Jan 01 '22 at 13:42

Minimize Sum of Absolute Difference of Two Arrays

3 Answers3