python fast mean squared error between two large 2d lists

Question

I want to calculate the mse between two very large 2d arrays.

x1 = [1,2,3]
x2 = [1,3,5]
x3 = [1,5,9]
x = [x1,x2,x3]
y1 = [2,3,4]
y2 = [3,4,5]
y3 = [4,5,6]
y = [y1,y2,y3]

expected result is a vector of size 3:

[mse(x1,y1), mse(x2,y2), mse(x3,y3)]

As for now, I am using sklearn.metrics.mean_squared_error as such:

mses = list(map(mean_squared_error, x, y))

This takes extremely long time, as the real lengths of xi and yi are 115 and I have over a million vectors in x/y.

C. Yduqoli · Accepted Answer · 2018-07-20T06:32:51.147

9

You can use numpy.

a = np.array(x) # your x
b = np.array(y) # your y
mses = ((a-b)**2).mean(axis=1)

If you want to use your x and y.

a = np.random.normal(size=(1000000,100))
b = np.random.normal(size=(1000000,100))
mses = ((a-b)**2).mean(axis=1)

With your specified matrix size (1 000 000 x 100) this takes less than a second on my machine.

edited Jul 20 '18 at 06:32

answered Jul 20 '18 at 06:27

C. Yduqoli

1 Answers1