20

I have a python list like this,

arr = [110, 60, 30, 10, 5] 

What I need to do is actually find the difference of every number with all the other numbers and then find the average of all those differences.

So, for this case, it would first find the difference between 110 and then all the remaining elements, i.e. 60, 30, 10, 5, and then it will find the difference of 60 with the remaining elements, i.e. 30, 10, 5 and etc.

After which, it will compute the Average of all these differences.

Now, this can easily be done with two For Loops but in O(n^2) time complexity and also a little bit of "messy" code. I was wondering if there was a faster and more efficient way of doing this same thing?

Stef
  • 13,242
  • 2
  • 17
  • 28
Asad Hussain
  • 564
  • 2
  • 15
  • 2
    Is the input guaranteed to be sorted in descending order? And if it's not, do you want the mean of the *absolute* differences, or do you always want to subtract later elements from previous elements when computing differences? – user2357112 Aug 08 '22 at 11:35
  • @user2357112 For my case specifically, the numbers will be sorted in descending order. However, I think a generic way of solving it could also be computed and might be helpful for people. – Asad Hussain Aug 08 '22 at 11:50

3 Answers3

36

I'll just give the formula first:

n = len(arr)
out = np.sum(arr * np.arange(n-1, -n, -2) ) / (n*(n-1) / 2)
# 52

Explanation: You want to find the mean of

a[0] - a[1], a[0] - a[2],..., a[0] - a[n-1]
             a[1] - a[2],..., a[1] - a[n-1]
                         ...

there, your

`a[0]` occurs `n-1` times with `+` sign, `0` with `-` -> `n-1` times
`a[1]` occurs `n-2` times with `+` sign, `1` with `-` -> `n-3` times
... and so on 
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • 6
    The base Python equivalent is `out = sum(x*j for x, j in zip(arr, range(n-1, -n, -2)))/(n*(n-1)/2)`. – J.G. Aug 08 '22 at 12:48
  • And for completeness, n*(n-1)/2 is (n choose 2), which is the number of pairs. – qwr Aug 09 '22 at 08:54
8

Not as brilliant as @QuangHoang's answer, this answer uses numpy broadcasting to calculate the differences matrix, then average the upper triangle values to get the answer.

import numpy as np

a = np.array([110, 60, 30, 10, 5])

dif = np.triu(a.reshape(len(a),-1) - a)
out = np.mean(dif[dif != 0])
52.0
AboAmmar
  • 5,439
  • 2
  • 13
  • 24
5

Now, this can easily be done with two For Loops but in O(n^2) time complexity and also a little bit of messy code.

No need for the code to be messy.

from statistics import mean

arr = [110, 60, 30, 10, 5]

m = mean(arr[i] - arr[j] for i in range(len(arr)) for j in range(i+1, len(arr)))

print(m)
# 52

If the array is not sorted, replace arr[i] - arr[j] with abs(arr[i] - arr[j]).

Stef
  • 13,242
  • 2
  • 17
  • 28