Let's walk through three different ways to do this computation (including the inverse) on OS X or iOS.
First, let's do what you were more-or-less trying to do to start with: we'll do the computation using LAPACK and BLAS from the Accelerate framework. Note that I've used the BLAS function cblas_sgemv
instead of vDSP_mmul
to do the matrix-vector multiplication. I've done this for three reasons. First, it's more idiomatic in code that uses LAPACK. Second, LAPACK really wants matrices stored in column-major order, which BLAS supports, but vDSP does not. Finally, vDSP_mmul
is actually just a wrapper around the BLAS matrix-multiply routines, so we can cut out the middleman. This will work on OS X back to 10.2 and iOS back to 4.0--i.e. on any target you can reasonably expect to encounter today.
#include <Accelerate/Accelerate.h>
#include <stdio.h>
#include <string.h>
void using_blas_and_lapack(void) {
printf("Using BLAS and LAPACK:\n");
// Note: you'll want to store A in *column-major* order to use it with
// LAPACK (even though it's not strictly necessary for this simple example,
// if you try to do something more complex you'll need it).
float A[3][3] = {{-4,-3,-5}, {6,-7, 9}, {8,-2,-1}};
float x[3] = { -1, 10, -3};
// Compute b = Ax using cblas_sgemv.
float b[3];
cblas_sgemv(CblasColMajor, CblasNoTrans, 3, 3, 1.f, &A[0][0], 3, x, 1, 0.f, b, 1);
printf("b := A x = [ %g, %g, %g ]\n", b[0], b[1], b[2]);
// You probably don't actually want to compute A^-1; instead you simply
// want to solve the equation Ay = b for y (like y = A\b in matlab). To
// do this with LAPACK, you typically use the routines sgetrf and sgetrs.
// sgetrf will overwrite its input matrix with the LU factorization, so
// we'll make a copy first in case you need to use A again.
float factored[3][3];
memcpy(factored, A, sizeof factored);
// Now that we have our copy, go ahead and factor it using sgetrf.
__CLPK_integer n = 3;
__CLPK_integer info = 0;
__CLPK_integer ipiv[3];
sgetrf_(&n, &n, &factored[0][0], &n, ipiv, &info);
if (info != 0) { printf("Something went wrong factoring A\n"); return; }
// Finally, use the factored matrix to solve y = A\b via sgetrs. Just
// like sgetrf overwrites the matrix with its factored form, sgetrs
// overwrites the input vector with the solution, so we'll make a copy
// before calling the function.
float y[3];
memcpy(y, b, sizeof y);
__CLPK_integer nrhs = 1;
sgetrs_("No transpose", &n, &nrhs, &factored[0][0], &n, ipiv, y, &n, &info);
printf("y := A\\b = [ %g, %g, %g ]\n\n", y[0], y[1], y[2]);
}
Next, let's do the computation using LinearAlgebra.h, which is a new feature of the Accelerate Framework in iOS 8.0 and OS X 10.10. It abstracts away nearly all of the bookkeeping necessary for these computations, but it only offers a tiny piece of the full functionality that available in LAPACK and BLAS. Note that while there's some boilerplate necessary to convert our raw arrays to and from la_objects, once we have them the actual computation is very straightforward.
#include <Accelerate/Accelerate.h>
#include <stdio.h>
void using_la(void) {
printf("Using LA:\n");
// LA accepts row-major as well as column-major data, but requires a short
// two-step dance to get our data in.
float Adata[3][3] = {{-4, 6, 8}, {-3,-7,-2}, {-5, 9,-1}};
float xdata[3] = { -1, 10, -3};
la_object_t A = la_matrix_from_float_buffer(&Adata[0][0], 3, 3, 3, LA_NO_HINT, LA_DEFAULT_ATTRIBUTES);
la_object_t x = la_vector_from_float_buffer(xdata, 3, 1, LA_DEFAULT_ATTRIBUTES);
// Once our data is stored as LA objects, it's easy to do the computation:
la_object_t b = la_matrix_product(A, x);
la_object_t y = la_solve(A, b);
// And finally we need to get our data back out:
float bdata[3];
if (la_vector_to_float_buffer(bdata, 1, b) != LA_SUCCESS) {
printf("Something went wrong computing b.\n");
return;
} else printf("b := A x = [ %g, %g, %g ]\n", bdata[0], bdata[1], bdata[2]);
float ydata[3];
if (la_vector_to_float_buffer(ydata, 1, y) != LA_SUCCESS) {
printf("Something went wrong computing y.\n");
return;
} else printf("y := A\\b = [ %g, %g, %g ]\n\n", ydata[0], ydata[1], ydata[2]);
}
And finally, one more method. If your matrices are really just 3x3, this is the method I would use, because the overhead involved in either of the previous methods will swamp the actual computation. However, this only works for matrices of size 4x4 or smaller. There's another new header in iOS 8.0 and OS X 10.10 specifically focused on small matrix and vector math, that makes this really simple and efficient:
#include <simd/simd.h>
#include <stdio.h>
void using_simd(void) {
printf("Using <simd/simd.h>:\n");
matrix_float3x3 A = matrix_from_rows((vector_float3){-4, 6, 8},
(vector_float3){-3, -7, -2},
(vector_float3){-5, 9, -1});
vector_float3 x = { -1, 10, -3 };
vector_float3 b = matrix_multiply(A, x);
printf("b := A x = [ %g, %g, %g ]\n", b[0], b[1], b[2]);
vector_float3 y = matrix_multiply(matrix_invert(A),b);
printf("y := A^-1 b = [ %g, %g, %g ]\n\n", y[0], y[1], y[2]);
}
And finally, let's just double-check that these all give identical results (up to small rounding differences):
scanon$ xcrun -sdk macosx clang matrix.m -framework Accelerate && ./a.out
Using BLAS and LAPACK:
b := A x = [ 40, -61, 98 ]
y := A\b = [ -0.999999, 10, -3 ]
Using LA:
b := A x = [ 40, -61, 98 ]
y := A\b = [ -1, 10, -3 ]
Using <simd/simd.h>:
b := A x = [ 40, -61, 98 ]
y := A^-1 b = [ -1, 10, -3 ]