Matrix Inversion with distributed computing in Apache Spark

Question

I need to find a way to compute the matrix inverse of some kind of distributed data type in Spark. The data is purely numerical and any way to perform this operation in Rowmatrices/DataFrames/RDDs would be incredibly useful. While there are some Stack Overflow posts on things like this, they involve conversion to local data types which is simply not feasible for the amount of data I am handling.

I've looked through using breeze for Scala and DenseMatrices in Spark but it seems as though these are not distributed and may not be as scalable as needed.

It sounds like you're looking for the same thing as https://stackoverflow.com/questions/29969521/how-to-compute-the-inverse-of-a-rowmatrix-in-apache-spark, but with a function that returns a distributed matrix, not a `DenseMatrix`. Is that right? — Zafar, Jul 01 '19 at 19:48
Exactly that. I have the same problem but I want to keep it distributed — Cheezbeez, Jul 01 '19 at 20:19
Why do you want to invert a matrix in the first place? For anything larger than 4x4, inverse matrices exist only in theoretical linear algebra books. Ah, there is even [a blog post on that](https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/). — Andrey Tyukin, Jul 03 '19 at 16:45
I'm trying to implement a cluster robust standard error calculation in Spark. To my knowledge, calculating the matrix inverse is an integral part of this, but there may be some workarounds I'm not familiar with. — Cheezbeez, Jul 03 '19 at 20:33
@AndreyTyukin do you know about this? I have a similar problem. — Zafar, Jul 03 '19 at 21:58
@Cheezbeez Does your formula look like something like `q(X^T X)^{-1}(X^T X)^{-1}`, where `X` is some matrix generated from some data points, and the `(X^T X)^{-1}` is the "inverse" that you are trying to find? In this case I'd suggest that you start your search [here](https://en.wikipedia.org/wiki/Iterative_method#Linear_systems). I'm almost sure that there must be efficient iterative algorithms for the special case of `A=X^T X` with a sparse `X`, I just can't recall the name. — Andrey Tyukin, Jul 03 '19 at 22:18
@Zafar Please see my previous comment. I'd suggest to grab any sufficiently thick book on numerical linear algebra, and check what it says first, before attempting to compute an explicit inverse of some gigantic sparse matrix that barely fits into a Spark cluster. — Andrey Tyukin, Jul 03 '19 at 22:20

Matrix Inversion with distributed computing in Apache Spark

0 Answers0