0

I have a 2-D numpy array like x = array([[ 1., 5.],[ 3., 4.]]), I have to compare each row with every other row in the matrix and create an new array of minimum values from both the rows and take the sum of minimum row and save it in a new matrix. Finally I will get a symmetric matrix.

Eg: I compare array [1,5] with itself. New 2-D array is array([[ 1., 5.],[ 1., 5.]]), I create a minimum array along axis=0 i.e [ 1., 5.] then take the sum of array which will be 6. Similarly I repeat the operation for all the rows and I end up with a 2*2 matrix array([[ 6, 5.],[ 5, 7.]]).

import numpy as np
x=np.array([[1,5],[3,4]])
y=np.zeros((len(x),len(x)))
for i in range(len(x)):
    array_a=x[i]
    for j in range(len(x)):
       array_b=x[j]
       array_c=np.array([array_a,array_b])
       min_array=np.min(array_c,axis=0)
       array_sum=np.sum(min_array)
       y[i,j]=array_sum

My 2-D array is very big and performing above mentioned operations are taking lot of time. I am new to python so any suggestion to improve the performance will be really helpful.

DYZ
  • 55,249
  • 10
  • 64
  • 93
Vinay
  • 3
  • 4

2 Answers2

1

The obvious improvement to save roughly half the time is to run only on i>=j indices. For elegance and some saving you can also use less variables.

import numpy as np
import time

x=np.random.randint(0, 10, (500, 500))
y=np.zeros((len(x),len(x)))

# OP version
t0 = time.time()
for i in range(len(x)):
    array_a=x[i]
    for j in range(len(x)):
       array_b=x[j]
       array_c=np.array([array_a,array_b])
       min_array=np.min(array_c,axis=0)
       array_sum=np.sum(min_array)
       y[i,j]=array_sum
print(time.time() - t0)

z=np.zeros((len(x),len(x)))

# modified version
t0 = time.time()
for i in range(len(x)):
    for j in range(i, len(x)):
        z[i, j]=np.sum(np.min([x[i], x[j]], axis=0))
        z[j, i] = z[i, j]
print(time.time() - t0)

# verify that the result are the same
print(np.all(z == y))

The results on my machine:

4.2974278926849365
2.746302604675293
True
Aguy
  • 7,851
  • 5
  • 31
  • 58
0

The obvious way to speed up your code would be to do all the looping in numpy. I had a first solution (f2 in the code below), which would generate a matrix that contained all the combinations that need to be compared and then reduced that matrix into the final result performing the np.min and np.sum commands. Unfortunately that method is quite memory consuming and therefore becomes slow when the matrices are big, because the intermediate matrix is NxNx2xN for a NxN input matrix.

However, I found a different solution that uses one for loop (f3 below) and appears to be reasonably fast. The speed-up to the original posted by the OP is about 4 times for a 1000x1000 matrix. Here the codes with some tests:

import numpy as np
import timeit

def f(x):
    y = np.zeros_like(x)
    for i in range(x.shape[0]):
        a = x[i]
        for j in range(x.shape[1]):
            b = x[j]
            y[i,j] = np.sum(np.min([a,b], axis=0))
    return y

def f2(x):
    y = np.empty((x.shape[0],1,2,x.shape[0]))
    y[:,0,0,:] = x[:,:]
    y = np.repeat(y, x.shape[0],axis=1)
    y[:,:,1,:] = x[:,:]
    return np.sum(np.min(y,axis=2),axis=2)

def f3(x):
    y = np.empty_like(x)
    for i in range(x.shape[1]):
        y[:,i] = np.sum(np.minimum(x[i,:],x[:,:]),axis=1)
    return y

##some testing that the functions work
x = np.array([[1,5],[3,4]])
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))

x = np.array([[1,7,5],[2,3,8],[5,2,4]])
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))

x = np.random.randint(0,10,(100,100))
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))

##some speed testing:
print('-'*50)
print("speed test small")
x = np.random.randint(0,100,(100,100))
print("original")
print(min(timeit.Timer(
    'f(x)',
    setup = 'from __main__ import f,x',
).repeat(3,10)))

print("using np.repeat")
print(min(timeit.Timer(
    'f2(x)',
    setup = 'from __main__ import f2,x',
).repeat(3,10)))

print("one for loop")
print(min(timeit.Timer(
    'f3(x)',
    setup = 'from __main__ import f3,x',
).repeat(3,10)))

print('-'*50)
print("speed test big")
x = np.random.randint(0,100,(1000,1000))
print("original")
print(min(timeit.Timer(
    'f(x)',
    setup = 'from __main__ import f,x',
).repeat(3,1)))

print("one for loop")
print(min(timeit.Timer(
    'f3(x)',
    setup = 'from __main__ import f3,x',
).repeat(3,1)))

And here the output:

True
True
True
True
True
True
--------------------------------------------------
speed test small
original
1.3070102719939314
using np.repeat
0.15176948899170384
one for loop
0.029766165011096746
--------------------------------------------------
speed test big
original
17.505746565002482
one for loop
4.437685210024938

With other words, f2 is pretty fast for matrices that don't exhaust your memory, but especially for big matrices, f3 is the fastest that I could find.

EDIT:

Inspired by @Aguy's answer and this post, here still a modification that only computes the lower triangle of the matrix and then copies the results to the upper triangle:

def f4(x):
    y = np.empty_like(x)
    for i in range(x.shape[1]):
        y[i:,i] = np.sum(np.minimum(x[i,:],x[i:,:]),axis=1)
    i_upper = np.triu_indices(x.shape[1],1)
    y[i_upper] = y.T[i_upper]
    return y

The speed test for the 1000x1000 matrix now gives

speed test big
original
18.71281115297461
one for loop over lower triangle
2.0939957330119796

EDIT 2:

Here still a version that uses numba for speed up. According to this post it is better to write the loops explicitly in this case:

import numba as nb
@nb.jit(nopython=True)
def f_nb(x):
    res = np.empty_like(x)
    for j in range(res.shape[1]):
        for i in range(j,res.shape[0]):
            res[j,i] = res[i,j] = np.sum(np.minimum(x[i,:], x[j,:]))

    return res

And the relevant speed tests give:

  • 0.015975199989043176 for a 100x100 matrix
  • 0.37946902704425156 for a 1000x1000 matrix
  • 467.06363476096885 for a 10000x10000 matrix

The 10000x10000 speed test for f4 didn't seem to want to finish at all, so I left it out. If your matrices get much bigger than that, you might actually run into memory problems -- did you consider this?

Thomas Kühn
  • 9,412
  • 3
  • 47
  • 63
  • Thanks @Thomas Kühn, your solution helped me save tons of time. Still my matrix is very big like 50000*2000 and its taking lots of time. Could you please advise any other solution? – Vinay Feb 16 '18 at 11:00
  • @Vinay How can it be a 50000x2000 array? This cannot be converted into a symmetric matrix. – Thomas Kühn Feb 16 '18 at 11:02
  • @ Thomas Kühn it can be converted by taking product of xx^T right? – Vinay Feb 16 '18 at 13:04
  • @Vinay was my answer helpful at all or not? If you are satisfied, you should accept one of the given answers, so that others know that the question needs no further attention. See also [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers). – Thomas Kühn Feb 19 '18 at 13:40
  • yours is the best answer I got but my matrix is 50000*2000 and even after 1 hour the operations are not getting completed. I am still trying to optimise the solution. – Vinay Feb 20 '18 at 08:51
  • @Vinay as I said before, with matrices this size you might run into memory problems. When that happens, the computer starts using swap and then any optimisation shown here will become useless, because the swap is going to be your bottleneck. – Thomas Kühn Feb 20 '18 at 09:03