How to vectorize populating larger matrix with items of smaller matrix in python

Question

I have some small symmetric matrices that are low dimensional representations of larger symmetric matrices. I have a vector that is a key showing which cells of the high-D matrix should be linked to which cells in the low-D matrix.

I would like to recreate these larger matrices by populating the larger matrix with its corresponding value in the low dimensional matrix. I believe there should be a vectorized approach to this, but so far all I've been able to come up with is a simple nested for loop, which is prohibitively slow for these matrices (10k+ rows & columns).

In this toy example, key is vec1, low-D matrix is source_mat, and high-D matrix is target_mat. I need to create target_mat where each cell is filled with the corresponding value from source_mat according to the key.

    import pandas as pd
    import numpy as np
    import random

    vec1=[]
    for x in range (0, 100):
        vec1.append(random.randint(0, 19)) #creating the key

    vec1=pd.DataFrame(vec1)
    sizevec1=vec1.shape[0]
    matshape=(sizevec1,sizevec1)
    target_mat=np.zeros(matshape) #key and target have same shape
    target_mat=pd.DataFrame(target_mat)

    temp=np.random.random((20,20))
    source_mat=temp*temp.T

    for row in range(0,target_mat.shape[0]):
        for column in range(0,target_mat.shape[1]):
            print 'row is ', row
            print 'column is', column
            target_mat.iloc[row,column] = source_mat.item(int(vec1.iloc[row]), int(vec1.iloc[column]))

if your posted answer doesn't work fast enough for you if you're running Python 3.5 on Windows I can compile it as a Cython program and let the compiler optimize every loop. You'd then import it as a function to call it and it would be even faster. — Matt, Jun 17 '17 at 20:06
Thanks @Matt - When I tried my answer on large scale data it still wasn't fast enough. Unfortunately I'm running 2.7 and on a Mac and Linux server- but the cython approach sounds promising. Wondering if there's a way to use map functions to vectorize the entire operation. — Aki Nikolaidis, Jun 18 '17 at 02:54
I'll see if my virtual box can handle the code when I get back to my PC. On Linux it goes to gcc though which is more tricky since my work environment is all Windows. Nonetheless when you compile a normal Python function using Cython the compiler will collapse the loops to a vectorized version if possible which would definitely improve performance — Matt, Jun 18 '17 at 03:12
Sounds great! Please do keep me posted. I'll post if I come across a way to vectorize it as well. — Aki Nikolaidis, Jun 18 '17 at 03:26
What I'll do first is compile on Windows with Cython since my build environment is already setup and see what kind of speed gains result. We're also moving everything at BP to Linux Amazon instances so if that looks good I'll try to setup a Linux GCC build but it might take some time. At least you'll have some idea of what speed increase is possible with c++ compilers and their inherit optimizations of loops. — Matt, Jun 18 '17 at 04:14
Did you look at Intel Python (free) coming with numpy/scipy: http://www.techenablement.com/orders-magnitude-performance-intel-distribution-python/ They claim 100x speed-ups in certain compute intensive codes — zam, Jun 18 '17 at 13:45

Matt · Answer 1 · 2017-06-19T17:30:14.030

This is 3x faster than your "fast" answer.

import random
import time
import numpy as np
vec1=[]
for x in range (0, 1000):
    vec1.append(random.randint(0, 19))

vec1=np.array(vec1)
sizevec1=vec1.shape[0]
matshape=(sizevec1,sizevec1)
target_mat=np.zeros(matshape)

temp=np.random.random((20,20))
source_mat=temp*temp.T
###FasterMethod###

target_mat=np.zeros(matshape)

def matrixops(vec1, source_mat, target_mat):
    matrixtime = time.time()
    for row in range(0,source_mat.shape[0]):
        for column in range(0, source_mat.shape[1]):

            rowmatch = np.array(vec1==row)
            rowmatch = rowmatch*1

            colmatch = np.array(vec1==column)
            colmatch = colmatch*1

            match_matrix=rowmatch*colmatch.T
            target_mat=target_mat+(match_matrix*source_mat[row,column])

    print((time.time() - matrixtime))

if __name__ == "__main__":
    matrixops(vec1, source_mat, target_mat)

Your fast version time: 4.246443033218384 This version time: 1.4500105381011963

And as my comment said, the Cython version isn't any faster at all. The only way to make it faster would be to take the lines that are dependent on the Python GIL and convert to C++ style operations (as I did with the == sections, writing a C++ like loop that does the same thing as the NumPy function but not supported with MemoryViews. Posted here for reference since I put a lot of time into it:

cimport numpy
from numpy import array, multiply, asarray, ndarray, zeros, dtype, int
cimport cython
from cython cimport view
from cython.parallel cimport prange #this is your OpenMP portion
from openmp cimport omp_get_max_threads #only used for getting the max # of threads on the machine

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef matrixops(int[::1] vec1, double[:,::1] source_mat, double[:,::1] target_mat):
    cdef int[::1] match_matrix =zeros(vec1.shape[0], dtype=int)
    cdef int[::1] rowmatch =zeros(vec1.shape[0], dtype=int)
    cdef int[::1] colmatch =zeros(vec1.shape[0], dtype=int)
    cdef int maxthreads = omp_get_max_threads()
    cdef int row, column, i
    # here's where you'd substitute 
    # for row in prange(source_mat.shape[0], nogil=True, num_threads=maxthreads, schedule='static'): # to use all cores
    for row in range(0,source_mat.shape[0]):
        for column in range(0, source_mat.shape[1]):
        #this is how to avoid the GIL
            for i in range(vec1.shape[0]):
                rowmatch[i]=(row==vec1[i])

            for i in range(vec1.shape[0]):
                colmatch[i]=(column==vec1[i])
            # this part has to be modified to not call Python GIL functions like was done above
            match_matrix=multiply(rowmatch,colmatch.T)
            target_mat=target_mat+(multiply(match_matrix,source_mat[row,column]))

That's your .PYX file above. If you have luck converting over you usually will see a 3x speedup on 4 cores. Sorry I wasn't successful at doing it, but 3x faster than your 100x faster solution is still decent using straight Python libraries.

@AkiNikolaidis the above answer is as fast as I can get. I can give the boilerplate Cython code but it is exactly the same speed so surprisingly, these libraries are completely optimized already!! The only thing you could do with that code is put it under a nogil=True prange statement and multithread it but it would take quite a bit of rearranging to avoid Python GIL dependent parts. — Matt, Jun 18 '17 at 22:03
Thanks a lot @Matt! I'll give it a shot- never worked with Cython before but should probably get more comfortable with Cython for tackling these problems — Aki Nikolaidis, Jun 19 '17 at 16:47
@AkiNikolaidis is the 3x speed improvement good enough (compared to your answer)? To really get that Cython code working you'd have to rewrite NumPy multiply and transpose functions to not use NumPy basically. — Matt, Jun 19 '17 at 17:18

score 1 · Accepted Answer · answered Jun 19 '17 at 21:16

Below are two separate updates to the code that led to pretty dramatically speedups.

First- figured out the vectorized solution, so now the calculation is done all in one step. This is the fastest method even after the second change-

Second- changed all pandas dataframe's to numpy arrays. This change had the greatest impact on the for loop code- which runs orders of magnitude faster now.

The code below calculates all 3 of the methods, the 'slow', 'fast', and 'Xu Mackenzie', named for friends that thought up the vectorized solution ;-P

#Initialize Variables

import time
import random
import pandas as pd
import numpy as np

n=13000
k=2000
i=0
vec1=[]
for x in range(0, n):
   vec1.append(random.randint(0, k-1))

temp=np.random.random((k,k))
#vec1=pd.DataFrame(vec1)
vec1=np.array(vec1)
#vec=pd.DataFrame(np.arange(0,300))
#vec2=pd.concat([vec,vec1], axis=1)
#sizevec1=vec1.shape[0]
sizevec1=len(vec1)
matshape=(sizevec1,sizevec1)
target_mat=np.zeros(matshape)
#target_mat=pd.DataFrame(target_mat)


source_mat=temp*temp.T
transform_mat=np.zeros((len(source_mat),len(target_mat)))

Slow Solution

matrixtime = time.time()
for row in range(0,target_mat.shape[0]):
    #print 'row is ', row
    for column in range(0,target_mat.shape[1]):

        #print 'column is', column
        target_mat[row,column] = source_mat.item(int(vec1[row]), int(vec1[column]))
print((time.time() - matrixtime))
target_mat_slow=target_mat
target_mat=np.zeros(matshape)

XU MACKENZIE SOLUTION

matrixtime = time.time()

for i in range(0,len(target_mat)):
  transform_mat[vec1[i],i]=1

temp=np.dot(source_mat,transform_mat)
target_mat=np.dot(temp.T,transform_mat)
target_mat_XM=target_mat
target_mat=np.zeros(matshape)
XM_time= time.time() - matrixtime
print((time.time() - matrixtime))

Previous 'fast' solution

matrixtime = time.time()
for row in range(0,source_mat.shape[0]):
    print 'row is ', row
    #for column in range(0, source_mat.shape[1]):
    for column in range(0, row):   
        rowmatch = np.array([vec1==row])
        rowmatch = rowmatch*1

        colmatch = np.array([vec1==column])
        colmatch = colmatch*1

        match_matrix=rowmatch*colmatch.T
        target_mat=target_mat+(match_matrix*source_mat[row,column])

print((time.time() - matrixtime))
target_mat_fast=target_mat
target_mat=np.zeros(matshape)

TEST FOR EQUIVALENCE

target_mat_slow==target_mat_fast
target_mat_fast==target_mat_XM

like it, give up and go the vectorized route! – Matt Jun 19 '17 at 23:32 — Matt, Jun 19 '17 at 23:32

score 0 · Answer 3 · answered Jun 17 '17 at 18:57

I managed to come up with a solution that offers a pretty great speedup, particularly for larger matrices. This relies on looping through the smaller matrix and populating the large matrix with it's matching elements.

I tried this solution with vec1 as a vector with 1000 elements, and found a 100x speedup over the previous method.

    import random
    import time
    import pandas as pd
    import numpy as np
    vec1=[]
    for x in range (0, 1000):
        vec1.append(random.randint(0, 19))

    vec1=pd.DataFrame(vec1)
    sizevec1=vec1.shape[0]
    matshape=(sizevec1,sizevec1)
    target_mat=np.zeros(matshape)
    target_mat=pd.DataFrame(target_mat)

    temp=np.random.random((20,20))
    source_mat=temp*temp.T

    ###Slow Method###

    matrixtime = time.time()
    for row in range(0,target_mat.shape[0]):
        for column in range(0,target_mat.shape[1]):
            #print 'row is ', row
            #print 'column is', column
            target_mat.iloc[row,column] = source_mat.item(int(vec1.iloc[row]), int(vec1.iloc[column]))
    print((time.time() - matrixtime))
    target_mat_slow=target_mat



    ###FasterMethod###

    target_mat=np.zeros(matshape)
    target_mat=pd.DataFrame(target_mat)

    matrixtime = time.time()
    for row in range(0,source_mat.shape[0]):
         for column in range(0, source_mat.shape[1]):

            rowmatch = np.array(vec1==row)
            rowmatch = rowmatch*1

            colmatch = np.array(vec1==column)
            colmatch = colmatch*1

            match_matrix=rowmatch*colmatch.T
            target_mat=target_mat+(match_matrix*source_mat[row,column])

    print((time.time() - matrixtime))
    target_mat_fast=target_mat

    #Test Equivalence
    target_mat_slow==target_mat_fast

How to vectorize populating larger matrix with items of smaller matrix in python

3 Answers3

Slow Solution

XU MACKENZIE SOLUTION

Previous 'fast' solution

TEST FOR EQUIVALENCE