-4

I have a matrix of dimensions 20,000,000*3 stored in a file. I want to access it very fastly. How can I do it? I can't declare map of that much size. What do I do? Please help.

Anirvana
  • 29
  • 5

2 Answers2

2

There are a number of possibilities:

  • If the matrix is sparse, load it into a map, leaving out the entries where the value is zero.

  • If the file version of the matrix has fixed sized records, then create a memory-mapped buffer and use indexing to access individual cells.

  • If the access pattern is sequential, then just read it.

  • and so on.

If you want a more specific answer you will need to provide more details; e.g. how the file is represented, is the matrix sparse, what are the access patterns, do you need to update the matrix, etc.


Matrix is not sparse. It basically contains 3 rows, all ints. Matrix is stored in a text file in the format <row1> <row2> row3>. I don't need to update it; I simply need to perform a number of searches on the values in row 1.

OK.

  • Convert the file to a binary format. This will make each row occupy the same number of bytes and make random access feasible.

  • Searching on the values in row1 suggests that you need to sort the columns in the file so that the rows are ordered on row1. (Alternatively, if you also need to lookup rows by original row number, create an index on column #1.)

  • Then map the sorted / indexed / converted file(s) into memory using a MappedByteBuffer and access it via an IntBuffer.

The total size of the mapped file should be in the region of 180Mb which shouldn't be a problem on a typical PC these days.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Matrix is not sparse, It basically contains 3 rows, all ints.Matrix in stored in a text file in the format I don't need to update it , I simply need to perform a nummber of searches on the values present in row 1 – Anirvana Jun 25 '11 at 14:48
0

I would rather than Map use arrays, however this might be too big. Try to find out what parts of the matrix would be used for computation, look into some divide and conquer/parallel matrix algorithms, which often decompose matrices in smaller matrices - structurally or using matrix multiplication, eigennumbers and other algebraic properties of matrices. You can than do various things like buffering and caching etc to speed up access to the data on disk.

Gabriel Ščerbák
  • 18,240
  • 8
  • 37
  • 52