6

I am writing code to compute Classical Multidimensional Scaling (abbreviated to MDS) of a very large n by n matrix, n = 500,000 in my example.

In one step of MDS, I need to compute the highest three eigenvalues and their corresponding eigenvectors of a n by n matrix. This matrix is called the B matrix. I only need these three eigenvectors and eigenvalues. Common methods of calculating eigenvectors and eigenvalues of a large matrix take a long time, and I do not require a very accurate answer, so I am seeking an estimation of the eigenvectors and eigenvalues.

Some parameters:

  1. The B matrix is symmetric, real, and quite dense
  2. The eigenvalue decomposition of B in theory should always produce real numbers.
  3. I do not require an entirely precise estimation, just a fast one. I would need it to complete in several hours.
  4. I write in python and C++

My question: Are there fast methods of estimating the three highest eigenvectors and eigenvalues of such a large B matrix?

My progress: I have found a method of approximating the highest eigenvalue of a matrix, but I do not know if I can generalize it to the highest three. I have also found this paper written in 1996, but it is extremely technical and hard for me to read.

Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186
Paul Terwilliger
  • 1,596
  • 1
  • 20
  • 45
  • A matrix that size would require more than a terabyte of storage given 64-bit floating-point entries. Forget eigenvectors -- even doing a single matrix-vector multiplication looks painful. – David Eisenstat Nov 30 '16 at 20:44
  • But there is no need to store the original matrix! It is indirectly given in the MDS algorithm and you can use that to perform matrix-vector multiplication without first computing the matrix. – Hans Olsson Dec 01 '16 at 08:55
  • Have you looked at approximate MDS meant for big data? E.g. see http://pike.cs.ucla.edu/~weiwang/paper/CIMCV06.pdf – Gene Dec 04 '16 at 19:05

3 Answers3

8

G. Golub and C.F Van Loan Matrix Computations 2nd in chapter 9 state that Lanczos algorithms are one choice for this (except that the matrix should ideally be sparse - it clearly works for non-sparse ones as well)

https://en.wikipedia.org/wiki/Lanczos_algorithm

Hans Olsson
  • 11,123
  • 15
  • 38
2

You can get the highest eigenvector of B and then, transform the data into B' using that eigenvector. Then pop the first column of B' and get B'' so you can get the highest eigenvector of B'': it is enough information to compose a plausible second highest eigenvector for B. And then for the third.

About speed: you can randomly sample that huge dataset to be only a dataset of N items. If you are getting only three dimensions, I hope you can also get rid of most of the data to get an overview of the eigenvectors. You can call it: 'electoral poll'. I cannot help you in measuring the error rate, but I will try sampling 1k items, several times, and seeing if results are more or less the same.

Now you can get the mean of several 'polls' to build a 'prediction'.

robermorales
  • 3,293
  • 2
  • 27
  • 36
0

Have a look at suggestions in this thread

Largest eigenvalues (and corresponding eigenvectors) in C++

As suggested there you can use ARPACK package which has a C++ interface.

Community
  • 1
  • 1
AdityaG
  • 428
  • 1
  • 3
  • 17