I have a matrix of dimensions 20,000,000*3 stored in a file. I want to access it very fastly. How can I do it? I can't declare map of that much size. What do I do? Please help.
-
5surely this is a sparse matrix, right? – Hovercraft Full Of Eels Jun 25 '11 at 13:03
-
2Depends on the contents of the matrix, what kind of data is it? – M Platvoet Jun 25 '11 at 13:04
-
1Duplicate of http://stackoverflow.com/questions/87679/advice-on-handling-large-data-volumes and http://stackoverflow.com/questions/140056/java-advice-on-handling-large-data-volumes-part-deux – THelper Jun 25 '11 at 13:08
-
How is the matrix represented in the file? – Thorbjørn Ravn Andersen Jun 25 '11 at 13:35
-
@Hovercraft Full Of Eels : It's not sparse. @M Platvoet : It contains int data @Thorbjørn Ravn Andersen : matrix is represented as
....
2 Answers
There are a number of possibilities:
If the matrix is sparse, load it into a map, leaving out the entries where the value is zero.
If the file version of the matrix has fixed sized records, then create a memory-mapped buffer and use indexing to access individual cells.
If the access pattern is sequential, then just read it.
and so on.
If you want a more specific answer you will need to provide more details; e.g. how the file is represented, is the matrix sparse, what are the access patterns, do you need to update the matrix, etc.
Matrix is not sparse. It basically contains 3 rows, all ints. Matrix is stored in a text file in the format
<row1>
<row2>
row3>
. I don't need to update it; I simply need to perform a number of searches on the values in row 1.
OK.
Convert the file to a binary format. This will make each row occupy the same number of bytes and make random access feasible.
Searching on the values in
row1
suggests that you need to sort the columns in the file so that the rows are ordered onrow1
. (Alternatively, if you also need to lookup rows by original row number, create an index on column #1.)Then map the sorted / indexed / converted file(s) into memory using a
MappedByteBuffer
and access it via anIntBuffer
.
The total size of the mapped file should be in the region of 180Mb which shouldn't be a problem on a typical PC these days.

- 698,415
- 94
- 811
- 1,216
-
Matrix is not sparse, It basically contains 3 rows, all ints.Matrix in stored in a text file in the format
I don't need to update it , I simply need to perform a nummber of searches on the values present in row 1
I would rather than Map use arrays, however this might be too big. Try to find out what parts of the matrix would be used for computation, look into some divide and conquer/parallel matrix algorithms, which often decompose matrices in smaller matrices - structurally or using matrix multiplication, eigennumbers and other algebraic properties of matrices. You can than do various things like buffering and caching etc to speed up access to the data on disk.

- 18,240
- 8
- 37
- 52