0

I recently started to use numpy memmap to link an array in my project since I have a 3 dimensions tensor for a total of 133 billions values for a graph of the dataset I am using as example.

I am trying to calculate the heat kernel signature of a 5748 nodes graph (21st of DD dataset). My code to calculate the projectors (where I use memmap) is:

Path('D:/hks_temp').mkdir(parents=True, exist_ok=True)
for l, ll in enumerate(L):
    pl = np.zeros((n, n))
    for k in ll:
        pl += np.outer(evecs[:, k], evecs[:, k])
    fp = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='w+', shape=(n, n))
    fp[:] = pl[:]
    fp.flush()

inside all the X_hks.npy there is a n by n ndarray (from the example 5748 * 5748).

Then I want all these computed arrays to form the 3 dimension tensor so I "link" (I don't know if it's the right term) them in this way:

P = np.array([None] * len(L))    # len(L) = 4043
for l in range(len(L)):
    P[l] = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='r', shape=(n, n))

P is used later only to do inside a cycle H = np.einsum('ijk,i->jk', P, np.exp(-unique_eval * t)).

However, that raises an error: ValueError: einstein sum subscripts string contains too many subscripts for operand 0. Since the method is correct for smaller graphs that doesn't require memmap, my thought was that P isn't well structured for numpy and I must arrange the data, maybe doing a reshape. So I tried to do a P.reshape(len(L), n, n) but it doesn't work giving ValueError: cannot reshape array of size 4043 into shape (4043,5748,5748). How can I make it work?

I already found this question but it doesn't fit this case. I think I can't store all inside one big object since it did 497GB of memmap files (126MB each). If I can do it, please tell me.

If it is impossible to do it I will reduce the use case, however I am quite interested to make it work for all the possibilities.

Ripper346
  • 662
  • 7
  • 22
  • You create `P` as an object dtype array of `len(L)` shape. It is 1d. Just because an element (or all) is 2d, does not mean you can index it as 3d. Even if you get the indexing right, you'll find that `einsum` can operate on `object` dtype. – hpaulj Feb 04 '21 at 17:41
  • @hpaulj so it's impossible in this way? I should do maybe a manual `einsum`, giving manually the indices. Totally inefficient (+ having these sizes) but doable... – Ripper346 Feb 04 '21 at 17:46
  • correction: `einsum` cannot operate on object dtype arrays. – hpaulj Feb 04 '21 at 17:59

0 Answers0