1

I've been looking for a way to create a custom h5py array that is in the end symmetric. Ideally it would be an array such that when it was created had a single value that a[i][j] and a[j][i] pointed to. The reason for this is that I will be writing a large distance vector into a square form matrix. The vector and square matrix are too large to hold in memory, so I would like a relatively fast solution to create the square matrix.

Mario S
  • 11,715
  • 24
  • 39
  • 47
biophetik
  • 989
  • 8
  • 5

1 Answers1

1

I would suggest doing this with a bit of extra logic use a 1D array to store just the upper triangle of the matrix. Map the index in the 1D array <-> the 2D array with a mapping like this:

[[0  1  2  3 ]
 [x  4  5  6 ]
 [x  x  7  8 ]
 [x  x  x  9 ]]

You can write a function for this as:

from __future__ import division
def tri_ravel_factory(n_cols):
    def tri_ravel(j, k):
        assert j < n_cols, 'j out of range'
        assert k < n_cols, 'j out of range'
        assert j >= 0, 'j out of range'
        assert k >= 0, 'j out of range'        
        if k < j:
            j, k = k, j
        #return sum(n_cols - tmp for tmp in range(0, j)) + (k - j)
        return j * n_cols - (j * (j-1))//2 + (k-j)
    return tri_ravel


test_ravel = tri_ravel_factory(4)
indx = test_ravel(1, 0)    

This only gets you a factor of two. You might be better off with a sparse array, computing the distances you need on the fly, or finding a way to not have to compute most of the distances (like if you only care about pair with distance < r).

tacaswell
  • 84,579
  • 22
  • 210
  • 199
  • That is what I originally had planned, but in the end I wanted to be able to sort along each row. Having both sides would be much easier then having extra logic to extract values for the lower triangle in order to sort a particular row. – biophetik Jun 18 '13 at 16:48
  • @biophetik Did you give up on trying to do this altogether then? You could wrap this logic up in a class with the data and then expose an interface that lets you do the sorting pretty easily. – tacaswell Jun 18 '13 at 17:58
  • Yeah I actually just switched the methodology to completely remove this problem. It still interests me however. I remember seeing an article where they created a new class from the h5py array which included the mirroring of the array. Essentially they pointed the the value to both the upper half and corresponding lower half of the matrix. I never found the post again though....maybe it was a dream. – biophetik Jun 24 '13 at 21:03
  • @biophetik Do you remember if this was done at the hdf level (which I can't think of how to do) or by sub-classing `np.ndarray`? – tacaswell Jun 24 '13 at 21:15
  • I believe it was at the low level of h5py, using http://www.h5py.org/docs/low/h5d.html. – biophetik Jun 26 '13 at 00:51