I'm working on machine learning problems where we need to evaluate pairwise interactions between data points a lot of times. Namely, given arbitrary data set X (m points in k dimensions) and Y (n points in k dimensions), we need to compute the m-by-n matrix A=f(X,Y) for certain bivariate function f, where A(i,j) = f(X_i,Y_j). You can think of f as a Gaussian but it is actually more general in practice.
The problem is that, though X, Y are not too large (<800 in size let's assume), there are a lot of such pairs (X,Y) thus a lot of such matrices A for us to compute, say, A1, A2, ... corresponding to different pairs (X1,Y1), (X2,Y2),... of different sizes.
I'm wondering if there is an efficient way to do this in C, for example, faster than the naive implementation with several for loops below (also for loops in simiFunc).
void similarityMatrix(int d, double **X, int nX, double **Y, int nY, double **mat) {
for (int i = 0; i < nX; ++i) {
for (int j = 0; j < nY; ++j) {
mat[i][j] = simiFunc(X[i],Y[j]); // simiFunc is defined elsewhere
}
}
}
Also, a follow-up question is how to implement the dense matrix-matrix or matrix-vector product efficiently, faster than the naive implementation. (better than gemm/gemv?)