I asked similar question for Python DASK earlier. I learned that DASK didn't support mutating the matrix/2D array. So, I was interested in if it's possible to achieve this in Julia. The scenario is;
- I'll go through a large text (genome actually) and find pairs of words (DNA sequences) that appear in same window. The window will scan whole genome so it will generate quite a lot of word pairs. (approximately, there will be billion windows and from each window 200 pairs will be generated so total number of pairs is expected to be around 200 billion)
- for each pair, I want to update the count (or co-occurrence) matrix which is expected to be around 65,000 x 65,000.
- the matrix is not sparse, it will be a dense one
- when matrix is completed, it will be used for SVD calculation
So, can I achieve this in Julia? Is there a feasible way to fill such a matrix with streaming data. I'm guessing it should be memory-mapped and disk-based storage but not sure if such approaches will tolerate too many updates. When I checked for possible solutions I couldn't find packages that fit the bill. Julia ecosystem is getting better by day, so I wanted to inquire again. Thanks.