6

I have two one-dimensional NumPy arrays X and Y. I need to calculate the mean absolute difference between each element of X and each element of Y. The naive way is to use a nested for loop:

import numpy as np
np.random.seed(1)
X = np.random.randint(10, size=10)
Y = np.random.randint(10, size=10)

s = 0
for x in X:
    for y in Y:
        s += abs(x - y)
mean = s / (X.size * Y.size)
#3.4399999999999999

Question: Does NumPy provide a vectorized, faster version of this solution?

Edited: I need the mean absolute difference (always non-negative). Sorry for the confusion.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • just for convenience, I think it's usually helpful to set the seed `np.random.seed(1)` whenever we generate random arrays - that way the answers can exactly reproduce your results. – Gene Burinsky May 20 '18 at 00:33
  • @GeneBurinsky fair enough. – DYZ May 20 '18 at 00:34
  • 1
    The elephant in the room is that your code generates rank 1 arrays, e.g. (10, ). So, miradulo's code produces the same result as your code for this type of array. However, if your array is of a different shape (2x5) for example, then the results are very different. – KRKirov May 20 '18 at 00:43
  • 1
    @KRKirov In general, you are right. But that's why I said that my arrays are linear (one-dimensional). Let me clarify this point in the question. – DYZ May 20 '18 at 00:45

3 Answers3

5

If I correctly understand what your definition is here, you can just use broadcasting.

np.mean(np.abs(X[:, None] - Y))
miradulo
  • 28,857
  • 6
  • 80
  • 93
4

Take the difference, then abs, then mean:

np.mean(np.abs(X - Y))

Alternatively:

diff = X - Y
abs_diff = np.abs(diff)
mean_diff = np.sum(abs_diff) / (X.size * Y.size)
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
  • Neat! But I need the mean _absolute_ difference (always non-negative). Sorry for the confusion. I don't think your solution could be extended to handle abs(). – DYZ May 20 '18 at 00:24
  • @DyZ just add `np.abs` to any of the differentiating clauses. `np.abs(X-Y)` or `np.mean(np.abs(X-Y))` – Gene Burinsky May 20 '18 at 00:27
  • @GeneBurinsky Nope, it's not the same. – DYZ May 20 '18 at 00:28
  • Unfortunately, this is not the same. Take `X=array([8, 4, 6, 0, 9, 2, 3, 2, 9, 4])` and `Y=array([4, 0, 1, 8, 4, 9, 2, 5, 7, 7])`. Your method gives 4.2, the true mean is 3.4. abs() and mean() are not commutative. – DYZ May 20 '18 at 00:32
  • 1
    @DyZ fair. I suppose we misunderstood as we assumed the difference between two arrays whereas what you actually meant to ask is the difference between every element in `X` and every element in `Y` – Gene Burinsky May 20 '18 at 00:37
2

If you tile on opposite axes, then you can abs the diff like:

Code:

x = np.tile(X, (X.size, 1))
y = np.transpose(np.tile(Y, (Y.size, 1)))

mean_diff = np.sum(np.abs(x-y)) / (X.size * Y.size))

Test Code:

import numpy as np
X = np.random.randint(10, size=10)
Y = np.random.randint(10, size=10)

s = 0
for x in X:
    for y in Y:
        s += abs(x - y)
mean = s / (X.size * Y.size)
print(mean)

x = np.tile(X, (X.size, 1))
y = np.transpose(np.tile(Y, (Y.size, 1)))

print(np.sum(np.abs(x-y)) / (X.size * Y.size))

Results:

3.48
3.48
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135