Any matrix library order-neutral?

Question

Sorry if this is too long, but I feel the question needs to be clarified:

I'm working on a xll library for Excel, i.e., a C library containing functions which can be registered and called directly form a cell. Ideally these functions should be called (or adapted to be called) also from VBA, in order to provide an interpreted environment for more complicated calculations (root finding, ode, optimization) which do not fit well within the Excel sheet. To be clear: THERE IS a way to call a sheet function from vba (function Application.Run), but it pays the unacceptable price of conversions from/to variant type for all parameters and return value. Now the funny situation is that, within the same Excel environment, a bidimensional matrix is passed in different ways:

for sheet functions, the native Excel-C interface passes to C a matrix in the row-major order (type FP12 of the Excel SDK);
for vba, a matrix is an LPSAFEARRAY, which has data organized in column-major order.

For unidimensional data (vectors) there is a solution dated back to BLAS (30 years ago), which can be translated to C in having a structure like

struct VECTOR { 
    int n;
    int stride;
    double * data;
    double & operator [] (int i) { return data[(i - 1)*stride]; }   
}

In other words, we use for the calculations an intermediate structure which does not own data, but can map both contiguous data or data linearly spaced with a constant gap (the stride). Struct's data can be processed sequentially, but they can also translated to an array section (if using cilk):

data [ 0 : n : stride ]

When we move from vectors to matrices, I've read that it is possible to abstract from the matrix order using both a row stride and a column stride. My naif attempt could be:

struct MATRIX {
    int rows;
    int cols;
    int rowstride;
    int colstride;
    double * data;
    inline double & operator () (int i, int j)  { return data[(i - 1)*rowstride + (j - 1)*colstride]; }
    MATRIX(int nrows, int ncols, int incrw, int inccl, double* dt) {rows = nrows; cols = ncols, rowstride = incrw; colstride = inccl; data = dt; }
    MATRIX(FP12 & A)        { rows = A.rows;  cols = A.cols;  data = A.array; rowstride = cols; colstride = 1;  }
    MATRIX(LPSAFEARRAY & x) { rows = ROWS(x); cols = COLS(x); data = DATA(x); rowstride = 1;    colstride = rows; }
    int els() { return rows * cols; }
    bool isRowMajor() { return rowstride > 1; }
    bool isScalar()   { return (rows == 1) & (cols == 1); }
    bool isRow()      { return (rows == 1); }
    bool isCol()      { return (cols == 1); }
    VECTOR col(int i) { return {rows, rowstride, &data[(i - 1)*colstride] }; }      // Col(1..)
    VECTOR row(int i) { return {cols, colstride, &data[(i - 1)*rowstride] }; }      // Row(1..)
    VECTOR all()      { return {els(), 1, data}; }  
    void copyFrom   (MATRIX & B) { for (int i = 1; i <= rows; i++) ROW(*this, i) = ROW(B, i); }
    MATRIX & Transp (MATRIX & B) { for (int i = 1; i <= rows; i++) ROW(*this, i) = COL(B, i); return *this; }
    void BinaryOp   (BinaryFcn f, MATRIX & B);
    MATRIX TranspInPlace() { return MATRIX(cols, rows, colstride, rowstride, data); }
    MATRIX SubMatrix(int irow, int icol, int nrows, int ncols) { return MATRIX(nrows, ncols, rowstride, colstride, &(*this)(irow, icol)); }
};

The two constructors from an FP12 or an LPSAFEARRAY initialize the structure pointing to data which is row-major or column-major organized. Is this order-neutral ? In my opinion yes: both data access (indexing) and row/column selection are correct independently from the order. The indexing is slower given the two multiplications, but it is possible to map very quickly rows and columns: after all the purpose of a matrix library is to minimize the single data access. Further, it is very easy to write macros that return an array section for a row or column, and for the whole matrix also:

#define COL(A,i) (A).data[(i-1)*(A).colstride : (A).rows : (A).rowstride]           // COL(A,1)
#define ROW(A,i) (A).data[(i-1)*(A).rowstride : (A).cols : (A).colstride]           // ROW(A,1)
#define FULL(A)  (A).data[0 : (A).rows * (A).cols]                                  // FULL MATRIX

In the above code indexes start from 1, but even this could be abstracted using a (modifiable) ofs 0-1 parameter in place of the hard-coded 1. The all() function / FULL() macro are correct only for a whole, contiguous matrix, not a submatrix. But also this could be adjusted, switching in this case to a loop over the rows. More or less like the above copyFrom method(), which can transform (copying) a matrix between a row-major and a column-major representation.

Note also the TranspInPlace method: by swapping rows/cols and rowstride/colstride we have a logical transposition of the same, untouched data, which means no more need to pass logical switches to an all-purpose routine (for example, GEMM for matrix multiplication, even though (to be fair) this does not cover the case of conjugate transposition).

Given the above, I'm looking for a library implementing this approach so that I can use (hook) it, but my search till now is not satisfactory:

GSL Gsl uses row-major order. Stop.

LAPACK The native code is column-major. The C-interface gives the possibility to treat row-major data, but only adding tailor-made code or (in some routine) physically transposing the matrix before operating on it.

EIGEN Being a templated library it can treat both. Unfortunately the matrix order is a parameter of the template, which means that every matrix method will be duplicated. It works, but is not ideal.

Please let me know if the are librarie closer to what I am after. Thx.

Row and colmn major differs by swapping A times B with B times A, and interpreting the output diferently, no? — Yakk - Adam Nevraumont, Nov 05 '16 at 12:02
In the two cases the same matrix is mapped in **linear** memory in different ways, e.g. [1 2 3; 4 5 6; 7 8 9] in memory is: row-major: [1 2 3 4 5 6 7 8 9], col-major: [1 4 7 2 5 8 3 6 9] — maurizio nardò, Nov 05 '16 at 15:33

score 1 · Answer 1 · answered Nov 05 '16 at 13:23

1

In Eigen, you can mimic this using a Map with runtime inner and outer strides. For instance, if you stick with ColumnMajor then the inner-stride corresponds to the row-stride, and the outer-stride corresponds to the column-stride:

Map<MatrixXd,0,Stride<> > mat(ptr, rows, cols, Stride<>(colStride,rowStride));

Then you can do any operations on mat, like accessing rows mat.row(i), columns mat.col(j), performs products, solve linear systems, etc.

The main drawback with that approach is that you loose SIMD vectorization.

answered Nov 05 '16 at 13:23

ggael

28,425
2
65
71

I'll have a look. For a row-major (like the FP12) i guess the strides are reversed, right? I do not have big problems in coding dense blas 1-3 with the above structure: a mixed, perhaps viable solution might be to use the above structure and have trampoline maps for things like linear system decomposition, etc. What do you think of this idea? – maurizio nardò Nov 05 '16 at 15:30
Even for matrix products much better use Eigen that will make the best use of your CPU (register/cache reuse, vectorization, etc.). – ggael Nov 05 '16 at 21:34

Any matrix library order-neutral?

1 Answers1